Artificial inelligence system with intuitive interactive interfaces for guided labeling of training data for machine learning models

ABSTRACT

At an artificial intelligence system, during a labeling feedback session, a visualization data set is presented via a programmatic interface. The visualization data set comprises a representation of data items for which labeling feedback is requested for generating a training set of a classifier. At least one of the data items is selected based on an estimated rank with respect to a metric associated with including the data item in a training set. During the session, respective labels for the data items and a filter criterion to be used to select additional data items are obtained. A classifier trained using the labels is stored.

RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.16/008,897, filed Jun. 14, 2018, and which is incorporated herein byreference in its entirety.

BACKGROUND

Machine learning combines techniques from statistics and artificialintelligence to create algorithms that can learn from empirical data andgeneralize to solve problems in various domains such as natural languageprocessing, financial fraud detection, terrorism threat level detection,human health diagnosis and the like. In recent years, more and more rawdata that can potentially be utilized for machine learning models isbeing collected from a large variety of sources, such as sensors ofvarious kinds, web server logs, social media services, financialtransaction records, security cameras, and the like.

Classification, or the task of identifying to which of a set ofcategories (sub-populations) a new observation belongs, on the basis oflearning from a training set of data containing observations or exampleswhose category membership is known, is one of the most useful andoften-used categories of machine learning techniques. A number ofalgorithms for classification of different levels of sophistication havebeen developed over the years, including, for example, linearclassifiers such as logistic regression algorithms, Bayesianclassifiers, support vector machines, decision-tree based algorithms,neural network-based algorithms and the like.

For many classification problem domains, a very large number ofunlabeled observations or examples may be available, and labels may haveto be assigned to at least a subset of the examples to generate anappropriate training data set for the particular classificationalgorithm being used. In order to assign the labels, depending on thecomplexity of the problem, in some cases subject matter experts may haveto be employed. For example, to label some types of medical records toindicate the likely presence or absence of a disease, the assistance ofmedical professionals may be required. Even in scenarios where the taskof distinguishing among classes is less complex, generating sufficientnumbers of labeled examples may require substantial human input.Furthermore, it is sometimes hard to determine the number of trainingexamples that may eventually be required to train a classifier thatmeets targeted quality requirements, since the extent to which differentexamples assist in the model's learning may differ. As a result of theseand other factors, generating a training set of labeled examples mayoften represent a non-trivial technical and resource usage challenge.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates an example system environment in which aclassification service may be implemented, according to at least someembodiments.

FIG. 2 illustrates example components of a training subsystem of aclassification service, according to at least some embodiments.

FIG. 3 illustrates additional details of example elements of a trainingsubsystem of a classification service, according to at least someembodiments.

FIG. 4 illustrates example submissions of batches of labels to aclassification service, asynchronously with respect to the start and endof classifier training iterations, according to at least someembodiments.

FIG. 5 illustrates an example of changing criteria for selectinglabeling feedback candidates over time during classifier training,according to at least some embodiments.

FIG. 6 illustrates example modes of classifier training, with respect tothe extent to which decisions are made in an automated manner by theclassification service, according to at least some embodiments.

FIG. 7 illustrates an overview of an example interactive interface whichmay be used to display labeling feedback candidates and obtain labels tobe used for training classifiers, according to at least someembodiments.

FIG. 8 illustrates examples of the use of highlighting to distinguishterms or tokens within labeling feedback candidates displayed via aninteractive interface, according to at least some embodiments.

FIG. 9 illustrates an example scenario in which a label provider may berequested, via an interactive interface, to reconsider whether apreviously-provided label is appropriate for a labeling feedbackcandidate, according to at least some embodiments.

FIG. 10 illustrates examples of interface elements that may be used toindicate user-defined labels and recommended token sets for searches,according to at least some embodiments.

FIG. 11 illustrates an example interactive interface element fordisplaying class distribution information, according to at least someembodiments.

FIG. 12 illustrates example interactive interface elements that indicatethe fraction of training observations whose class has not yet beendetermined, according to at least some embodiments.

FIG. 13 illustrates an example interactive interface element thatprovides summarized information about a set of status indicators,according to at least some embodiments.

FIG. 14 illustrates an example interactive interface element thatprovides historical information about a set of status indicators,according to at least some embodiments.

FIG. 15 illustrates example interactive interface elements that provideinformation about a set of selected diagnosis tests pertaining toclassifier training completion, according to at least some embodiments.

FIG. 16 illustrates aspects of an example configuration setup tab of aninteractive interface for training classifiers, according to at leastsome embodiments.

FIG. 17 illustrates additional aspects of an example configuration setuptab of an interactive interface for training classifiers, according toat least some embodiments.

FIG. 18 illustrates aspects of an example class range definition tab ofan interactive interface for training classifiers, according to at leastsome embodiments.

FIG. 19 illustrates aspects of an example labeling feedback tab of aninteractive interface for training classifiers, according to at leastsome embodiments.

FIG. 20 illustrates aspects of an example evaluation tab of aninteractive interface for training classifiers, according to at leastsome embodiments.

FIG. 21 illustrates aspects of an example training effort pause andtermination tab of an interactive interface for training classifiers,according to at least some embodiments.

FIG. 22 illustrates a high-level overview of invocations of applicationprogramming interfaces for interactions between clients and a machinelearning service utilizing interactive labeling feedback for classifiertraining, according to at least some embodiments.

FIG. 23 illustrates example elements of a programmatic request toinitiate training of a classifier, according to at least someembodiments.

FIG. 24 illustrates an example scenario in which the set of candidatedata items presented for labeling feedback may be customized forrespective label providers, according to at least some embodiments.

FIG. 25 illustrates an example provider network environment in which aclassification service may be implemented, according to at least someembodiments.

FIG. 26 is a flow diagram illustrating aspects of operations that may beperformed to train classifiers with the help of interactive labelingfeedback sessions, according to at least some embodiments.

FIG. 27 is a flow diagram illustrating aspects of operations that may beperformed during interactive labeling sessions of a classificationservice, according to at least some embodiments.

FIG. 28 is a flow diagram illustrating aspects of operations that may beperformed to present visual representations of training statusindicators during classifier training, according to at least someembodiments.

FIG. 29 is a block diagram illustrating an example computing device thatmay be used in at least some embodiments.

While embodiments are described herein by way of example for severalembodiments and illustrative drawings, those skilled in the art willrecognize that embodiments are not limited to the embodiments ordrawings described. It should be understood, that the drawings anddetailed description thereto are not intended to limit embodiments tothe particular form disclosed, but on the contrary, the intention is tocover all modifications, equivalents and alternatives falling within thespirit and scope as defined by the appended claims. The headings usedherein are for organizational purposes only and are not meant to be usedto limit the scope of the description or the claims. As used throughoutthis application, the word “may” is used in a permissive sense (i.e.,meaning having the potential to), rather than the mandatory sense (i.e.,meaning must). Similarly, the words “include,” “including,” and“includes” mean including, but not limited to. When used in the claims,the term “or” is used as an inclusive or and not as an exclusive or. Forexample, the phrase “at least one of x, y, or z” means any one of x, y,and z, as well as any combination thereof.

DETAILED DESCRIPTION

Various embodiments of methods and apparatus for efficient training ofmachine learning models such as classifiers using an automated workflowcomprising intelligently guided labeling feedback sessions aredescribed. In some embodiments, easy to use interactive programmaticinterfaces may be implemented to simplify and speed up the process ofobtaining labels for data items that are deemed likely to contribute tofaster learning. In various embodiments, customizable visualizations oftraining progress may be provided, and a number of techniques involvingthe provision of automated recommendations leading to rapid developmentof the models may be implemented. The provision of labeling feedback ina systematic manner so as to enable rapid development of machinelearning models may be referred to as “teaching” the machine learningsystem in various embodiments, and the set of tools and interfaces usedfor managing the training of the models may be referred to as aninteractive machine training tool in such embodiments. By way ofexample, classifiers or classification models, in which individual dataitems are categorized into one of a discrete set of pre-defined classesby the machine learning models, are used to illustrate various aspectsof the techniques and interactive interfaces used for speeding upmachine learning training procedures in much of the followingdescription. Note that similar approaches may also be used for othertypes of machine learning problems, such as regression-type problems,with similar levels of success in at least some embodiments.

The techniques and algorithms described for efficient development ofmodels such as classifiers may be implemented as part of anetwork-accessible machine learning service or a network-accessibleclassification service in at least some embodiments. Such a service may,for example, help to streamline various stages of the workflow ofbuilding classifiers, such as data collection through trainingiterations, evaluation and deployment of the models in variousembodiments. The service may support the automation of, among otherparts of the workflow, the following steps in at least some embodiments:(a) gathering of raw data items pertinent to a particular classificationproblem, (b) active learning, in which unlabeled documents may be rankedin order of potential training/learning benefit or impact, (c)user-controlled teaching, in which respective label providers can focustheir labeling efforts on subspaces of the data items, (d) optimizedtraining iterations, in which the service may try a variety of modelsand/or hyper-parameter combinations and select the best among the modelsand hyper-parameter combinations, (e) justification-based debugging andanalysis of the models, (f) continuous performance evaluation astraining iterations proceed, (g) plug-in support for different types ofinput data items or documents, feature processing, etc., (h)customizable export of data, including labels, intermediate results,final results and the like for consumption by other automated systemsand/or (i) customizable diagnosis and tracking of trends in varioustraining progress metrics, and the like.

As one skilled in the art will appreciate in light of this disclosure,certain embodiments may be capable of achieving various advantages,including some or all of the following: (a) reducing the overall amountof CPU, memory, networking and storage resources that have to beutilized for developing machine learning models of a desired qualitylevel, even in scenarios in which hundreds of millions of unlabeled dataitems or observations pertaining to a given machine learning problem areavailable, (b) enhancing the user experience of at least three types ofusers interacting with an automated machine learning environment:individuals or entities that specify the machine learning problems to besolved using machine learning models, label providers for data itemsused for training models, and data scientists or machine learningexperts who may wish to analyze or debug the models being developed, (c)simplifying the presentation of potentially complex combinations ofmetrics and analysis results (for example by incorporatingmulti-dimensional information regarding the distribution of predictedclasses within a simple two-dimensional ribbon that takes up only asmall portion of the user interface, or by using colorshades/intensities to visually indicate/highlight significant attributesrelevant to respective classes) to enable even non-experts in machinelearning to interpret the available information, and/or (d) reducing theresources needed by identifying label providers that are experts atspecific aspects of one or more machine learning problem domains,thereby further reducing the overall time and computational resourcesneeded to train machine learning models. In some embodiments, theresources and/or time required to generate a classifier may be reducedby orders of magnitude relative to at least some conventionaltechniques.

According to some embodiments, a system may comprise one or morecomputing devices of an artificial intelligence-based classificationservice. The computing devices may perform one or more interactiveclassifier training iterations until a training completion criterion ismet. The set of classes into which data items are eventually to becategorized for a given classification problem, and for which respectivelabels may be required for at least some data items to be used asmembers of the classifier training data, may be termed “target” classesin at least some embodiments. Binary as well as multi-classclassification efforts may be supported by the classification service inat least some embodiments. In various embodiments, a given trainingiteration may include obtaining, via an interactive programmaticinterface, respective class labels for at least some data items of aparticular set of data items identified as candidates for labelingfeedback in a previous interactive training iteration. At least someclass labels may be obtained asynchronously with respect to the start orend of the given training iteration in some embodiments—that is,individuals selected as label providers may submit respective batches ofone or more labels at any convenient time relative to trainingiterations being performed by the computing devices. The given trainingiteration may, in some embodiments, also comprise generating, using oneor more classifiers, classification predictions corresponding to a testset of data items. An individual classifier may be trained using atraining set that includes at least a portion of a data item of theparticular set for which labels were obtained in such embodiments. Basedat least in part on an analysis of the classification predictions, anactive learning algorithm and/or on an intelligent sampling algorithm,another set of data items may be identified as candidates for labelingfeedback for a subsequent training iteration in some embodiments. In atleast one embodiment, thus, at least three types of operations may beperformed during a given training iteration: (a) labels may be collectedand accumulated asynchronously for some set of data items that wereselected for labelling feedback during some earlier iteration, (b) anupdated version of one or more classifier may be trained using atraining set which includes labels obtained earlier and (c) a new set oflabeling candidates may be selected based on the results obtained fromthe updated version of the classifier(s).

After the overall training completion criteria have been satisfied, aparticular classifier (which may be referred to as the “published” or“final” classifier) may be used to obtain classification predictionswith respect to one or more data items that were not used for thetraining iterations, and the obtained predictions may be provided to oneor more destinations in various embodiments. The published classifierthat is used to make post-training classification predictions may alsobe trained using at least some data items whose labels were obtainedfrom the label providers. In at least some embodiments, at least twotypes of models may be trained iteratively: (a) a set of one or moremodels whose output with respect to a training set is used to selectcandidates for labeling feedback for subsequent training iterations(e.g., using an active learning algorithm which uses variance inpredictions among the different models for a given data item), and (b) afinal (with respect to a current training iteration) model whose resultsare used to evaluate the overall progress towards the trainingobjectives, to identify attribute values that are correlated withmembership in different classes, and so on. The latter type of model maybe termed the “iteration-final” model in some embodiments. A variety oftraining completion criteria may be employed in differentembodiments—e.g., in some embodiments, training may be consideredcomplete after acceptable results are obtained on a set of diagnosistests, or when values of selected metrics values meet thresholdcriteria, while in other embodiments, the training iterations may simplybe terminated when a budget of resources is used up.

According to at least one embodiment, a “committee” comprising aplurality of classifiers may be used to identify candidates for labelingfeedback during a given training iteration. For example, with respect toa set of data items (D1, D2, , Dp), respective class predictions may beobtained from each of several classifiers (C1, C2, . . . , Ck), whereeach of the classifiers may have been trained using a different trainingset in one such embodiment. A measure of variation among the predictionsgenerated by the different models may be computed, and those data itemswhose variation measures meet a threshold criterion may be selected ascandidates for labeling feedback in the next training iteration in someembodiments. Such a variation-based selection of candidates for labelingfeedback may be based, for example, on the intuition that if thedifferent classifiers are unable to “agree” on the class of a given dataitem, that data item is more likely to be difficult to classify, andtherefore more likely to help improve the quality of the model once itis labeled and included in a training set for subsequent trainingiterations. In at least one embodiment, a k-fold cross validationalgorithm may be used to select candidates for labeling feedback. In atleast one embodiment, instead of or in addition to using such a variancemetric to rank data items as candidates for labeling feedback, othermetrics (such as proximity of a predicted classification score to aclass boundary) may be used, and/or other types of active learningapproaches may be employed.

In some embodiments, a pool of label providers may be selected, and eachsuch label provider may be presented, via an interactive interface suchas web page, with a respective set of data items for which labelingfeedback is desired. Such label providers may issue respective “submit”requests to provide batches of one or more labels to the computingdevices where training sets for subsequent iterations are identified.Different label providers may submit batches of labels asynchronouslywith respect to each other, and asynchronously with respect to the startor end of the training iterations, in at least some embodiments. The setof newly-labeled (or re-labeled) items that are to be included in thetraining sets for the next iteration of model training may beaccumulated as more submissions are received from the label providers insuch embodiments. That is, in such embodiments, a label provider neednot wait for a training iteration to complete before submitting the nextbatch of labels, and the back-end resources at which the models arebeing trained need not wait for any particular batch of labelsubmissions—each side may proceed asynchronously with respect to theother, as long as more labels are gradually provided over time.

In some embodiments, a number of candidate data items selected forlabeling feedback, e.g., using a committee of models, may be subdividedinto smaller subsets, e.g., comprising N data items each, and only anindividual subset may be presented to a label provider for feedback at atime. For example, from among 200 data items for which labels aredesired, 10 items may be presented to a given label provider at a time,so as not to overwhelm the label provider with too many items at once.After a given subset is labeled and submitted, the next subset may bepresented to the label provider. The label providers may, in at leastsome embodiments, be able to specify a desired batch size (the number ofcandidates that are presented to them for labeling feedback at a time)via a programmatic interface.

In at least one embodiment, the label providers (and/or other users) maybe able to provide additional guidance, in the form if filteringcriteria, to the classification service, to determine the set of dataitems that are presented to the label providers via an interactiveinterface. For example, label providers may submit search predicates orqueries (e.g., sets of search tokens or attribute values), class labels,and the like to be used to filter the data items in some embodiments. Inat least some embodiments, the classification service may suggest orprovider recommended search terms to be used for filtering, e.g., basedon analysis of the correlations between different terms and classmembership, or based on the presence of the search terms in data itemsthat have been found to be more difficult to classify than others.

As mentioned earlier, in various embodiments, several steps of theworkflow for training classifiers may be automated and simplified by theclassification service in some embodiments. According to someembodiments, the classification service may implement one or moreprogrammatic interfaces, such as an interactive web-based interface, aset of application programming interfaces (APIs), command-line tools,graphical user interfaces and the like, that can be used to initiate oneor more of the steps. In one embodiment, a client of the service (suchas an individual authorized to initiate classifier development) maysubmit a programmatic request via such an interface, indicating a datasource from which various input data items can be obtained, and/or aclassification objective (e.g., an explanation of how data items are tobe categorized among a specified set of categories). The classificationservice may obtain or extract data items from the data source fordifferent training iterations, and present an indication of theclassification objective to a label submitter to enable the labelsubmitter to perform his/her labeling duties in various embodiments. Inat least some embodiments, the programmatic interfaces may be used tospecify feature processing operations to be applied to the data items,such as various vectorization algorithms to be used to generate featurevectors from the raw versions of the data items, which can then beconsumed as input by the classifiers. The types of models (such aslogistic regression models, neural network models and the like) that areto be developed for classification, together with various hyperparametersettings to be tried, may also be specified via programmatic interfacesin at least some embodiments.

In at least one embodiment, the classification service may employ one ormore label providers for a given classification problem, such as subjectmatter experts with respect to the problem domain, volunteers, or agroup of individuals who have been identified via a web-based taskmarketplace (e.g., a web site at which individuals may register theirinterest in performing tasks such as labeling data items for a fee). Thecardinality of the set of label providers may be selected based onvarious factors in different embodiments, and may change over time asthe training iterations proceed in at least one embodiment. The factorsmay for example include the size of the data set available, the expectedvariation among the characteristics of the data items, the type of modelbeing developed (e.g., larger training data sets may be needed if aneural network-based model is being generated than if a logisticregression model is being generated), a desired label provider pool sizeindicated by a requester for the classifier, a budget available forlabeling, and so on. In at least some embodiments, as the trainingiterations proceed, the interactions with individual label providers maybe analyzed, e.g., to determine which label providers are moreproficient in identifying particular classes of data items, to determinethe rate at which individual label providers are able to generatelabels, and so on. In at least some embodiments, the classificationservice may enable some users to view the sets of labels provided by anindividual label provider, and/or one or more metrics pertaining tolabel submission by the label provider such as the rate at which labelsare generated, a comparison of the labels with respect to predictedclasses, and so on. Based on the characteristics of individual labelersrevealed by analysis of their interactions with the classificationservice, in at least one embodiment, respective groups of feedbackcandidate data items may be identified and presented for different labelproviders. For example, in a scenario in which data items are to beclassified as being members of one of three categories CatA, CatB andCatC, a particular label provider L1 may be better at identifying CatAdata items than CatB data items, so if more examples of CatA data itemsare needed for the training iterations, more candidate data items may bedirected to L1 than to other label providers of the pool being used. Inat least one embodiment, software programs may be included among labelproviders—that is, label providers may not be limited to humans.

According to some embodiments, the classification service may be able todetect correlations between specific input features and different targetclasses into which data items are being categorized. Such correlationsmay be used to highlight specific features (such as text tokens, imagesand the like) in the interactive labeling sessions with label providersin some embodiments, potentially assisting the label providers in theircategorization and labeling decisions. In at least one embodiment,correlation metrics may also be used to select feedback candidate dataitems. In one such embodiment, another machine learning model (differentfrom the classifiers being trained) may be generated to identify acorrelated-with-classification-variation subset of properties ofindividual data items, such that a particular group of properties of thesubset has a correlation above a selected threshold with a variation inclassification predictions generated for data items that have thatparticular group of properties. For example, it may be discovered usingsuch a model that if a given data item D1 has properties P1, P2 and P3,the variation in classification predictions generated for D1 by acommittee of classifiers tends to be higher than a threshold T, with apositive correlation between the presence of the three properties andthe high variation in the predicted class. Having determine such asubset of correlated with-classification-variation properties, in someembodiments, the classification service may use the subset to identifyat least some candidates for labeling feedback for a particular trainingiteration.

Note that at least in some embodiments, an initially-provided label fora data item may not necessarily be permanent—that is, a labelingfeedback candidate data item selected by the classification service mayalready have a label, but the label may be changed based onreconsideration by a label provider. The classification service may, forexample, be able to identify possible candidates for labelreconsideration based on the presence of features that are correlatedhighly with a class other than the one currently assigned to the dataitem, and present such data items via an interactive interface thatindicates why the current labels may be incorrect. In at least someembodiments, one of the metrics provided by the classification servicemay include an indication of a number of provided labels (e.g., in somerecent time interval or during some recent training iteration) thatdiffer from the predicted classes for the data items for which thelabels were provided.

A number of different types of visualization data sets and correspondinginteractive interface elements may be generated in various embodimentsby the classification service for presentation to clients on whosebehalf the classifiers are being developed, presentation to labelproviders, and/or presentation to data scientists or other entitiesinterested in analyzing/debugging the classifiers, following theclassifier training progress, and so on. According to some embodiments,a guided labeling feedback session may be initiated between a labelprovider and a model trainer or training coordinator (e.g., a componentof the classification service responsible for developing a requestedclassifier) by one or more computing devices of an artificialintelligence-based classification service. During such a session, thecomputing devices may cause one or more visualization data sets to bepresented to the label provider via an interactive programmaticinterface, including a particular visualization data set which comprisesan ordered representation of one or more data items for which labelingfeedback is requested. The order in which the data items are arrangedmay be based at least in part on an estimated rank, with respect to oneor more metrics such as an estimated learning contribution, associatedwith including respective ones of the one or more data items in atraining set for one or more training iterations of one or moreclassification models in some embodiments. In at least one embodiment, arepresentation of the first data item may indicate a particularattribute of the first data item whose correlation with a particularpredicted class exceeds a threshold—for example, features (such as texttoken sets) of the input data that are highly correlated with differentclasses may be highlighted in different colors in some implementations.

A guided labeling feedback session may also comprise obtaining, by theone or more computing devices, respective indications from the labelprovider via the interactive programmatic interface of (a) respectivelabels for one or more data items represented in the one or morevisualization data sets and (b) a filter criterion to be used to selectone or more other data items to be presented via the interactiveprogrammatic interface to the label providers in some embodiments. Avariety of filter criteria may be specified in different embodiments,such as search terms or query predicates (either generated by the labelprovider, or recommended by the classification service and approved bythe label provider), class labels (e.g., either the names of classesinto which the data items are to be categorized, or labeler-createdtemporary class labels as discussed below), properties of data itemssuch as their data sources, dates of data item creation/collection etc.In at least one embodiment, when providing a label for a particular dataitem, the label provider may indicate a justification for the label(e.g., the presence/absence of some set of features, the apparentsimilarity with another similarly labeled item, etc.), and suchjustification information may be stored and/or displayed later by theclassification service. The labels provided in the session may be usedto train a classification model in various embodiments; after the modelis trained, its classification predictions with respect to various dataitems may be provided to one or more destinations (such as a client of aclassification service, or a program which consumes the output generatedby the classification service).

A number of interactive interface elements may simplify and/or guide thetasks performed by label providers in different embodiments. Forexample, various aspects of the statistical distribution among classesassigned to data items, resulting from the set of training iterationsthat have been completed thus far, may be presented visually to thelabel providers in some embodiments. Such interfaces, such as azoom-in-capable ribbon interface described below, may enable thelabelers to examine or re-examine data items that are designated asbelonging to a particular class with a classification score within aparticular range, to examine data items that have not yet been assigneda class with a desired level of confidence, and so on. In effect, in oneembodiment, a label provider may indicate an “examples-requested” regionof the statistical distribution as part of the filtering criteria to beused to present additional data items to the label provider. Portions ofsuch an interface may be used to demarcate class boundaries in someembodiments—e.g., in a scenario in which binary classification is beingperformed, boundary markers may be used to provide a visual indicationof the fraction of the data items that have been designated as belongingto each of the two classes being considered, and the fraction that havenot yet been classified with a targeted confidence level.

Visualization data sets pertaining to selected training statusindicators and diagnosis tests may be generated and presented viainteractive interfaces in at least some embodiments. According to someembodiments, one or more computing devices of a classification servicemay determine, corresponding to individual ones of a plurality ofclassifier training iterations, respective sets of status indicators. Afirst set of such status indicators may, for example, include (a) arepresentation of a fraction of a first set of data items for whichclassification results that have been obtained in a particularclassifier training iteration meet a threshold criterion and (b) arepresentation of a stability trend of a particular training metric overa plurality of classifier training iterations. A training data set ofthe particular classifier training iteration may, for example, compriseat least some labels obtained in response to a presentation of one ormore data items of the first set as candidates for labeling feedback invarious embodiments. In response to a programmatic request, avisualization data set representing at least one set of statusindicators may be presented via an interactive programmatic interface inone embodiment. A presentation of the visualization data set may, forexample, include an indication, within a first display, of (a)respective values of a plurality of selected status indicators as of afirst classifier training iteration and (b) a plurality of values of anindividual status indicator as of respective successive classifiertraining iterations in some embodiments—that is, the display may make iteasy to see the values of several different metrics as of a giventraining iteration, and also make it easy to identify trends in any oneof the status indicators or metrics.

In some embodiments, one or more training enhancement actions may beinitiated (e.g., after the visualization data set has been presented, orasynchronously with respect to the presentation of the visualizationdata set) to meet one or more goals or objectives associated with orexpressed using the status indicators. For example, one such trainingenhancement action may comprise selecting, by the computing devicesbased at least in part on an objective associated with a particularstatus indicator, one or more data items for which respective labelingfeedback is to be obtained programmatically in a subsequent classifiertraining iteration. Other training enhancement actions may include, forexample, directing a larger number of feedback candidate data items tospecific label providers selected based on learning about thecapabilities of the different label providers, modifying one or morehyper-parameters, and the like. Eventually, e.g., after one or moretraining objectives have been met, a trained version of a classifier(which has been trained using a data set that includes labels obtainedas a result of a training enhancement action) may be used to obtain andprovide classification predictions for various data items.

According to at least some embodiments, programmatic interfaces such asinteractive web pages, graphical user interfaces, command line tools orAPIs may be used by clients or users of the classification service toindicate a set of training metrics and associated objectives, for whichcorresponding status indicators included in the visualization data setsmay be determined. Any desired combination of a wide variety of metricsmay be indicated via the programmatic interfaces in differentembodiments, depending on the type of classification problem (binaryversus multi-class) being addressed: such as (among others) positivepredictive value (PPV), negative predictive value (NPV), accuracy,prevalence, precision, false discovery rate, false omission rate,recall, sensitivity, diagnostic odds ratio, coverage, and/or an F1score. Respective objectives may be defined for the different metricsvia the programmatic interface, and the training status with respect tothe objectives may be indicated in the visualization data set in atleast some embodiments. In at least some cases, the progress indicatorwith respect to a given metric may be a value on a continuous range,such as “X % of objective achieved as of training iteration 112”. Inother cases, the progress indicator may be binary, such as “Target X notachieved as of training iteration 122”. In at least some cases, thestatus indicators may be defined in terms of trends, as in the casewhere stability with respect to a given metric is used as a progressindicator.

In various embodiments, new labeled examples may be added to trainingsets as the training iterations proceed, e.g., as new batches of labelsare submitted by label providers in guided labeling sessions of the kinddiscussed above. In some embodiments, in response to input received viaan interactive programmatic interface, the classification service mayprovide an indication of a difference in training data sets between onetraining iteration and another, which may help data scientists or otheranalysts to understand why some status indicators have changed betweenthe iterations. In at least one embodiment, the classification servicemay itself perform an analysis of the difference in training data setsbetween a pair of training iterations, and provide an indication of oneor more candidate explanatory factors associated with a difference intraining metrics between the pair of training iterations. Suchexplanatory factors may include, for example, a histogram or some othersummarization of significant token sets or terms in the training setsused for different iterations—if the significant tokens are verydifferent from one iteration to another, this may help explain thedifferences in the metrics.

In at least one embodiment, with respect to a subset of training metricsand associated training status, a set of diagnosis tests may be definedto help determine when the training procedure has met its overallobjectives and should therefore be terminated. In effect, a givendiagnosis test may provide a binary indicator of whether a givenmetric's status has met a particular threshold condition for publishingor finalizing the classifier being trained, and the aggregation ofmultiple diagnosis tests may provide an indicator if whether all thethreshold conditions selected for the classifier with respect to aplurality of metrics have been met. Note that not all the metrics whosestatus indicators are displayed may have associated binary diagnosistests in at least some embodiments: users may view status indications ofan arbitrary collection of metrics, which may differ from the collectionof metrics being used for diagnosis tests. In some embodiments, a usermay indicate one or more diagnosis tests to be used via a programmaticinterface to the service; in other embodiments, a default set ofdiagnosis tests may be selected by the classification service itself,e.g., based on the type of classification problem being solved, and theuser may add/remove tests from the default set of desired. Up-to-datesummaries or detailed results of the diagnosis tests may be provided viainteractive programmatic interfaces upon request in various embodiments.In some embodiments, in situations where a particular diagnosis testresults is unsatisfactory (e.g., if a corresponding metric has not met atargeted threshold), a recommended remedial action (such as adding morelabeled examples of a particular class to the training set) may beindicated by the classification service via the programmatic interface.In one embodiment, an explanation of a remedial action may be providedas well. In some embodiments, a user may approve a recommended remedialaction via an interactive programmatic interface, and in response todetermining that the remedial action has been approved, the remedialaction may be initiated by the classification service.

In some embodiments, a range of automation levels may be supported withrespect to the actions taken by the classification service withoutcorresponding requests having to be submitted by a user during variousstages of a workflow for classifier development, relative to the actionstaken in response to user guidance. For example, a user may choose a“fully automated” mode of operation, in which case remedial actions withrespect to diagnosis tests (as well as other optimization operations)may be initiated automatically by the service, or a “manual” mode ofoperation, in which the user may have to select and/or approve remedialactions and other optimizations. Similarly, in a fully automated mode,decisions regarding the types of classification algorithm to be used,the criteria to be used by the service to terminate training, and thelike, may be made largely or entirely by the service.

In some embodiments, one or more data items may be labeled as members ofa particular class, but the classification service may predict that thedata items belong to a different class in at least some trainingiteration—that is, the conclusions reached by the classification servicein such an iteration regarding class membership of one or more dataitems may differ from those of the user. For example, it may be the casethat during training iteration T5, the service predicts that item 14belongs to class C1, and a label provider or other user may label theitem 14 later as belonging to class C2. In at least some embodiments, anindication of such a labeling-versus-prediction contrast may be providedvia an interactive interface, e.g., enabling a user to reconsider theirlabeling decision with respect to such data items. In at least oneembodiment, the service may provide a justification for theprediction—e.g., by highlighting or otherwise indicating token sets orattributes of the data items that are correlated with the predictedclass. The user may decide to confirm the original labeling decision, orchange the label to match the predicted class. A number of additionalfeatures and capabilities that help to speed up the process of trainingmachine learning models in various embodiments are discussed below infurther detail.

Example System Environment

FIG. 1 illustrates an example system environment in which aclassification service may be implemented, according to at least someembodiments. As shown, system 100 may comprise resources and artifactsof a classification service 102, including a training subsystem 130 anda run-time subsystem 170 in the depicted embodiment. Raw data items tobe used to train models and/or to exercise trained models 114 may beextracted from a variety of data sources in some embodiments, includingstatic data sources 140A and dynamic data sources 140B (such asstreaming data services which collect and/or emit data items on anongoing or continuous basis, e.g., based on signals collected at varioustypes of sensors or other devices). A variety of machine learningalgorithms may be available at library 120 for use during various stagesof training, evaluation and/or execution of classifiers for numeroustypes of problem domains, including for example neural network basedalgorithms, logistic or other regression algorithms, tree-basedalgorithms such as Random Forest algorithms and the like.

The classification service 102 may implement a collection ofprogrammatic interfaces 177, including for example web sites or pages,graphical user interfaces, command line tools and/or APIs, which can beused for interactions between various types of users and the service 102in various embodiments. At least three broad categories of programmaticuser interactions may be supported in the depicted embodiments:classifier training setup sessions 181, debug/analysis sessions 182 andinteractive labeling sessions 183. In training setup sessions 181, insome embodiments an authorized user such as a business unit ordepartment manager or some other stakeholder interested in classifying acollection of data items obtained from data sources 140 may use a clientdevice 180A (e.g., a desktop, laptop, tablet computing device, smartphone or the like) to initiate the process of training one or moreclassifiers, e.g., by submitting a training request to the service 102.In debug/analysis sessions 182, in some embodiments data scientists,subject matter domain experts and the like may use client devices 180Bto examine the progress of classifier training, to modify variousparameters or hyper-parameters based on observed metrics, statusinformation and the like. In interactive labeling sessions 183, invarious embodiments one or more label providers may be presented atclient devices 180C with candidate data items for which labelingfeedback is requested, and such label providers may submit labels forthe candidate items, submit filtering requests to view and/or labeladditional data items, and so on. Intuitive, easy-to-use feature-richcustomizable interactive programmatic interfaces 177 may be provided foreach of the three categories of user sessions indicated in FIG. 1 invarious embodiments; details of various aspects of the interfaces areprovided below. Note that a single user session need not necessarily belimited to one type of interaction in at least some embodiments—e.g., asingle user may setup classifier training, provide labels when needed,and debug/analyze the progress of the training via the same session. Atthe classification service 102, one or more interaction interfacemanagers 155, implemented using one or more computing devices, mayreceive messages submitted programmatically from the client devices 180,pass on internal versions of the communications to other components ofthe service 102, receive internal responses from such components, andprovide external responses to the messages via the programmaticinterfaces 177 to the users as needed.

In many cases, at least some of the raw data items generated at andavailable from data sources 140A and 140B pertaining to a particularproblem to be addressed using a classifier may be unlabeled. In order totrain a classifier, a sufficient number of the data items may have to belabeled in various embodiments, and the exact number that is ultimatelysufficient to attain a desired classification quality may vary with thedata set, the classification algorithm(s) being used and the subtlety ordifficulty of the classification problem domain In general, since thelabeling task may require sophisticated human judgments, this phase ofobtaining the training data set can be quite time consuming and resourceintensive—for example, for some neural network based algorithms,millions of data items may potentially have to be labeled. In variousembodiments, the training of a classifier (or an ensemble ofclassifiers) developed for a particular classification problem may beaccelerated at the service 102 using the following high level approach.

Initially, after the set of classes into which the data items are to becategorized is identified (e.g., in a training setup session 181), asmall subset of data items may be labeled using a fairly rough approach(using, for example, simple keyword-based labeling or even randomlabeling) in various embodiments to obtain an initial training set.Then, a sequence of training sessions may be initiated in at least someembodiments, in each of which a current version of one or moreclassifier models (such as a committee of models trained on a respectiverandom subset of the currently available training set) may be trained,e.g., by training/evaluation coordinators 174 using training resources110 of subsystem 130. Training/evaluation coordinators 174 may bereferred to as model generators in some embodiments. A set ofinteractive guided labeling sessions 183 may be set up (e.g., betweenmodel generators and label providers, with the help of interactioninterface managers 155) to gradually expand the training set as thetraining iterations proceed in various embodiments. Analysis/debugsessions 182 may be used in some embodiments to help initiate varioustraining enhancement actions based on various training statusindicators, such as increasing the number of labeling candidates thatare likely to belong to a subset of the classes being considered, addingnew labeling sessions, etc.

The results obtained in a given training iteration may be utilized,e.g., by active learning-based labeling candidate selectors 150, toidentify an additional set of data items for which labeling feedback isexpected to help improve the classifiers more quickly than other dataitems, and such candidates may be presented to label providers in theinteractive labeling sessions. The active learning methodology may beemployed in the depicted embodiment based on the intuition that somedata items can provide more substantive contributions to the learning ofthe models than others—for example, labeled data items that are veryeasy to classify may not help the model's learning very much, while dataitems that are close to class boundaries and are therefore moredifficult to classify may be more useful for accelerating learning. Anycombination of one or more active learning algorithms, including forexample query by committee, uncertainty sampling, expected model changealgorithms, expected error reduction algorithms, variance-reductionalgorithms, and/or density-weighted algorithms, may be used in variousembodiments. The training iterations, the presentation to labelproviders of the candidate data items, and the submission of additionalor corrected labels by the label providers, may all be asynchronous withrespect to one another in various embodiments—e.g., the presentation ofcandidate data items may not have to wait for a training iteration tocomplete, and label providers may submit labels and/or filteringrequests at any time, independently of when a training session starts orends. The overall objective of the iterative interactive trainingprocedure may in various embodiments comprise quickly gathering, giventhe set of label providers and budget available, a reasonable trainingdata set to achieve a desired level of quality for the classifier(s)being generated, while simplifying the user experience of the variousentities interacting with the service.

In at least some embodiments, during a given training iteration,classification predictions corresponding to a test set of data items maybe generated from each of a plurality of classifiers, where individualclassifiers of the plurality are trained using a training subset thatincludes labels obtained from the label providers. The training datasets of the classifiers may differ from one another in some embodiments.Based on an analysis (e.g., a variance analysis) of the classificationpredictions generated by the different classifiers, filtering criteriaindicated by the label providers, and/or a sampling algorithm, a new setof candidate data items for labeling feedback may be identified, andlabels obtained for the new set may be used to gradually expand thetraining sets available for classifiers over time in variousembodiments. Any of a variety of active learning algorithms andtechniques (some of which may not necessarily be variance based) may beemployed in different embodiments to help select labeling candidateslikely to be more useful for learning than others. In one embodiment,several different classification algorithms may be employed with thesame training data set, and the results of the different algorithms maybe compared with one another, with items whose class predictions differwidely among the algorithms being designated as difficult to classifyand therefore good candidates for labeling feedback.

In at least some embodiments, in addition to a group of classifiers usedto help identify the next set of labeling feedback candidates, a final(with respect to the current training iteration) classifier may also betrained in a given iteration, e.g., using all the labeled training dataavailable, and the results obtained from thefinal-with-respect-to-the-current-iteration classifier on a test set maybe used to evaluate whether quality-related training completion criteriahave been met. Of course, training iterations may also be terminated forreasons other than achieving a desired level of classification qualitywith respect to various measures—e.g., training may be concluded when abudget of resources or time is exhausted in some embodiments, even ifall the classification quality goals have not been reached.

In various embodiments, presentation of visualization data sets viaintuitive interactive interfaces may play a key role in accelerating thedevelopment of high-quality classifiers. In the labeling sessions 183,for example, labeling candidates items may be presented in an orderbased on respective ranking of the data items with respect tocontribution towards attaining one or more training objectives in someembodiments, so that if a label provider is only able to provide a fewlabels, the most useful labels (with respect to learningbenefit/contribution) are more likely to be obtained first. Token setsor other features that are highly correlated with membership in aparticular class may be highlighted to help label providers make theirdecisions in some embodiments, and interface elements that enable labelproviders to narrow down the set of data items they wish to inspectand/or label may be provided in various embodiments. With respect toanalysis and/or debugging of the classifiers as they are being trained,interface elements that enable users to specify or modify a set ofmetrics to be tracked, to display the change in metrics value or statusover time, to determine how many diagnosis tests have been satisfied,and/or to approve or specify training enhancement actions associatedwith metrics of interest may be provided in various embodiments.Additional examples and details of various interface elements that mayhelp different categories of users to understand, guide and speed up thetraining of classifiers are provided below.

After training is concluded, in at least some embodiments the trainedmodels 114 (e.g., the most recent version of the final classifier) maybe published or accepted for production use. Execution coordinators 175of the run-time subsystem 170 may use model execution resources 132 torun the trained models 114 to generate class predictions 160corresponding to various data items that were not part of the trainingsets in the depicted embodiment. In various embodiments, one or morecomputing devices may be employed for individual ones of the componentsshown in FIG. 1 , such as the training subsystem, the run-timesubsystem, the interaction session managers, and/or the labelingcandidate selectors.

Training Subsystem

FIG. 2 illustrates example components of a training subsystem of aclassification service, according to at least some embodiments. In thedepicted embodiment, training subsystem 240 (which may be similar incapabilities and features to training subsystem 130 of FIG. 1 ) maycomprise, for example, a data item retrieval subsystem 210, anitem-specific vectorization subsystem 212, a global vectorizationsubsystem 214, an interactive session input analysis subsystem 216, asearch subsystem 218, an active learning subsystem 220, aniteration-final model training/evaluation subsystem 222, and aninteractive session output presentation subsystem 224. In otherembodiments, the training subsystem may comprise other combinations ofsubcomponents.

In various embodiments, the data sources 201 whose items have to becategorized may comprise either static collections of data items,dynamic (e.g., streaming) collections of data items, or a combination.For example, in some large-scale e-retailing environments, the datasources may comprise entries of an expanding catalog, and a particularbinary classification task to be accomplished may comprise which itemsof the catalog are affected by a particular legal or regulatoryrequirement. The data item retrieval subsystem 210 may be responsiblefor extracting data items from a variety of data sources 201 in variousembodiments, e.g., employing data-source specific APIs, performing somelevel of normalization on the raw data items retrieved, and so on. Adata retriever may be defined in several ways in the depictedembodiment—e.g., by providing a lookup predicate to be used to search orfilter a data source, by indicating parent nodes in a previously-createdhierarchy of data items (such that data items corresponding to childrennodes of the parent nodes should be retrieved), by indicating otherclassifiers whose output is to be used as input for a new classifier,and so on. As with other components of the classification service, thedata item retrieval subsystem may be designed to be extensible andcustomizable in various embodiments—e.g., users may add modules toaccess different types of data sources as desired, to utilize desireddata source access APIs, to perform different types of normalization onraw data item contents, and so on.

In at least some embodiments, at least some of the attributes orelements of raw data items (e.g., text tokens, images, otherunstructured fields and the like) may have to be transformed intovectors before being used as input for classification or other machinelearning models. Such transformations may represent one example offeature processing performed at the classification service, e.g., basedon user requests or automatically. In the depicted embodiment, two typesof vectorization may be performed—global vectorization at subsystem 214,in which properties or attributes common to all data items retrieved fora particular classification problem may be transformed into vectors, anditem-specific vectorization performed at subsystem 212, in whichattributes that may not be shared among all the data items may bevectorized and/or combined with the outputs of the global vectorizationsubsystem 214 to generate the final feature vectors for individual dataitems. Common components of vectorizers (used at either the global levelor for item-specific vectorization) may include, among others,tokenizers for text, case normalizers, n-gram extractors,term-frequency-inverse-document-frequency (tfidf) generators, 1-hotencoders, bucket generators for discretizing continuous-valued numericalattributes, and the like in various embodiments. In some embodiments,the classification service may provide default implementations ofvarious types of vectorizers, and users may customize or extendvectorizers as desired. A matrix comprising feature vectors for aplurality of data items may be prepared as output of the vectorizationsubsystems in some embodiments. In at least one embodiment, attributesextracted from individual data items may be provided as input to asearch subsystem 218 from the vectorization subsystems, where forexample an inverted index may be created on the attribute values toenable filtering based on the attribute values.

Input generated during interactive labeling sessions by various labelproviders and/or by other users may be examined at the session inputanalysis subsystem 216 in the depicted embodiment. The input maycomprise, for example, the labels designated by the label providers forvarious displayed candidate data items, search terms for filtering,and/or other types of input such as zoom-in requests for various classmembers, and the like. At the search subsystem 218, the search terms orpredicates may be used to help identify the subset of data items forwhich labeling feedback is to be requested next in the depictedembodiment from label providers. In the active learning subsystem 220,in some embodiments, the results obtained from one or more classifiers(e.g., a committee of classifiers) may be examined in conjunction withthe filtering requests input from the labelers, and a ranking of aselected set of data items may be performed in at least someembodiments, in which the data items may be arranged in order ofpotential contribution of to-be-assigned labels towards one or moretraining objectives. In at least some embodiments, data items whosepredicted classes show the greatest variation among the members of thecommittee may be considered harder to classify and may therefore beconsidered better candidates for labeling feedback.

In addition to the classifiers used for selecting candidates, aniteration-final model (also referred to as thefinal-with-respect-to-the-current-iteration classifier) may also betrained in at least some embodiments using resources of subsystem 222.For example, if N classifiers are used to obtain prediction variationmeasures to help rank the unlabeled data items, in one embodimentindividual ones of the N classifiers may be trained using 1/Nth of theavailable training data, while the iteration-final classifier model maybe trained using the entire training data set available. The quality ofthe predictions generated by the iteration-final model, as estimatedusing one or more selected metrics, may be used to determine whetheradditional training iterations are required in some embodiments, orwhether the training procedure can be concluded. The predictions of theiteration-final model may also be used in at least some embodiments toidentify terms, attributes or features that are highly correlated withmembership in various classes being considered. Results of the search,active learning and iteration-final model training/evaluation subsystemsmay be formatted for presentation as output by subsystem 224 in thedepicted embodiment. Such displayed results may include, for example,important or significant tokens, attributes or features, training statusmetrics, diagnosis test results and the like in some embodiments. In atleast some embodiments, the results obtained for the iteration-finalmodel may be included in the criteria used at the search subsystem torank data items for labeling feedback purposes—e.g., if a goal for aparticular metric measured using the iteration-final model is morelikely to be satisfied by obtaining a label for a data item D1 than by adata item D2, D1 may have a higher probability of being included in thesearch results generated at search subsystem 218.

FIG. 3 illustrates additional details of example elements of a trainingsubsystem of a classification service, according to at least someembodiments. In the depicted embodiment, a search subsystem 302 within atraining subsystem (similar to that discussed earlier in the context ofFIG. 2 and FIG. 1 ) may include an item attribute analyzer 304 which isused to create an inverted index 306 on attribute values identified fromthe data items. Results 382 obtained from an active learning subsystem320 may be used to generate a static rank among potential candidates forlabeling feedback in some embodiments at the search subsystem 302. Theinverted index 306 as well as the static rank information 310 may beused as input to a regression model or regressor 312 which is trained toidentify attributes that are highly correlated with membership of one ormore classes. The output of the regressor 312 may be used to generatesearch recommendations 384 in some embodiments—e.g., token sets that arerecommended as search predicates when a user fills out a search inputform via an interface presented by the classification service. In atleast one embodiment, recommended search terms may be combined withauto-complete and/or auto-correct features of the search inputinterface—e.g., in a scenario where the term “scooter” is identified bythe regressor as being highly correlated with membership in a particularclass, and a use types in the letters “sc” in a search term entry box,the term “scooter” may be listed among the alternative recommendedsearch terms that can be selected by the user. A final ordering of theset of data items to be presented as labeling feedback candidates may beperformed by a ranker module 308, which may use the search query termsentered by users, the item labels generated by users, the invertedindex, and/or the static rank obtained from the active learningsubsystem as inputs in the depicted embodiment. As indicated in FIG. 3 ,the final ranking produced by the ranker 308 may be based on acombination of factors in at least some embodiments, including but notnecessarily limited to the static rank 310 obtained from the activelearning results, features extracted from terms of search queries 381,additional features extracted from the attributes of the data items,detections of potentially incorrect labels obtained from labelproviders, and the like. As such, the final ranking may be considered anexample of dynamic ranking (rather than purely static ranking) in suchembodiments. The displayed search results 383 may be presented in anorder selected by the ranker 308 in various embodiments. In someembodiments, a slightly different approach may be used to determine theorder in which labeling candidate items should be presented to users viathe classification service's interactive interfaces. First, adetermination may be made as to whether a given label provider hasindicates one or more filters (e.g., via the search input interface, vialabel selection interfaces, via indicated ranges of classificationscores, or the like), and a set of candidate data items may beidentified based on the filters. Next, from among the filter-based setof candidate data items, results obtained from the active learningsubsystem may be used to identify the candidates likely to be mosthelpful in learning, and the items may be arranged in order ofdecreasing influence on learning before being presented to the users.Thus, in different embodiments, the manner in which user-specifiedfilters and active learning results are used to arrange labelingfeedback candidate data items may differ.

A variety of active learning approaches may be employed in differentembodiments to help select candidate data items for labeling. In thedepicted embodiment, a bagger 322 may assign different combinations oflabeled data items among the respective training sets of members of acommittee 324 of classifiers, and the variation among the classespredicted for a given unlabeled data item by the different members maybe used as an indication of the potential benefit of labeling the item.For example, consider two unlabeled data items UD1 and UD2, a committeecomprising four classifiers C1, C2, C3 and C4 trained using respectivetraining subsets, and a binary classification scenario in which anindividual data item is either assigned a “1” or a “0” to indicate oneof the two possible classes by a given classifier. Assume further thatthe binary classification comprises generating a real-valuedclassification score in the range zero to one, and an item is predictedas being a member of class “1” if the score exceeds 0.5, and as being amember of class 0 if the score is less than or equal to 0.5. Assumefurther than the scores generated for UD1 by the four members of thecommittee are 0.8, 0.75, 0.9 and 0.87, and the scores assigned to UD2are 0.33, 0.67, 0.8, and 0.2. Because the scores for UD1 are moreconsistent (i.e., the score variance is low) with respect to oneanother, one may infer that it is “easier” to classify UDA as a memberof class “1”. In contrast, the variance of the scores is higher for UD2,so UD2 may be considered harder to classify, and therefore a bettercandidate for obtaining labeling feedback from a label provider. Byselecting harder candidates for labeling feedback sooner in the trainingprocess, the speed with which the iteration-final model learns may beincreased relative to scenarios in which easy-to-classify items are usedsooner than harder-to-classify items in various embodiments. Note thatmetrics other than variation in the classification scores may be used torank candidates in some embodiments—e.g., proximity to a class boundarymay also or instead be used. If, for example, the scores for UD2 were0.52, 0.51, 0.54, and 0.55 in the above example, the variation for UD2would be less than the variation for UD1, but the proximity of thescores to the class boundary score (0.5) may still lead to ranking UD2over UD1 as a labeling feedback candidate. In some embodiments, otheractive learning approaches such as uncertainty sampling, expected modelchange algorithms, expected error reduction algorithms,variance-reduction algorithms, and/or density-weighted algorithms, maybe used at active learning subsystem 320. Combinations of suchalgorithms and/or the committee-based approach shown in FIG. 3 may beemployed in at least one embodiment. At least in some embodiments, adata item need not necessarily be unlabeled to be selected as a labelingfeedback candidate—e.g., reconsideration of a previously-assigned labelmay be requested from a label provider in some cases especially if thepredictions of the model do not match the previously-provided label. Insome embodiments, k-fold cross validation may be used (e.g., with kmembers of the committee 324) at the active learning subsystem.

The training/evaluation subsystem 340 for the iteration-final model maycomprise a trainer 342 and an evaluator 346 in the depicted embodiment.The trainer 342 may, for example generate a trained classifier version344 use a training set comprising all the currently available labeleddata items, from which a set of iteration-final predictions 348 may beobtained on an evaluation data. One or more classification metrics maybe obtained by an evaluator 346 and included in the set of displayedmetrics 385 in some embodiments. A set of important attributes 386(e.g., text tokens or other attributes that are found to be highlycorrelated with membership in various classes) of data items may beidentified in at least some embodiments and included in thevisualization data sets presented to various users of the classificationservice (e.g., by highlighting the attributes in the set of displayeditems presented to a label provider). In at least some embodiments,metrics and associated status or diagnosis test results obtained byevaluator 346 may also be used by the ranker when generating thecandidate data items for which labeling feedback is to berequested—e.g., data items which, if labeled, would lead to achieving aparticular as-yet-unmet objective associated with a metric may be rankedhigher than data items which would be less likely to help meet theobjective.

Timing of Label Submissions Relative to Training Iterations

FIG. 4 illustrates example submissions of batches of labels to aclassification service, asynchronously with respect to the start and endof classifier training iterations, according to at least someembodiments. Two timelines are shown to indicate the relationshipsbetween the timings of operations at the back end training resources ofthe classification service (which may have features and capabilitiessimilar to those discussed above), relative to the timing of labelingfeedback provided to the service by label providers. Back-end trainingresources timeline 402 shows training iteration #K starting at some timeT0, and continuing till time T1, training iteration #(K+1) starting atT1 and continuing until T2, and iteration #(K+2) starting at T2 andending at T3 in the depicted embodiment.

Along labeling feedback providers' timeline 404, the timing of two typesof events during the various training iterations is illustrated. Shortlyafter a given training iteration completes and new training iteration isstarted, a new set of labeling feedback candidate data items may beidentified (e.g., using results generated by the classifiers trained inthe just-completed iteration), e.g., at times (T0+delta), (T1+delta),(T2+delta) and (T3+delta). At least a subset of the newly-identifiedcandidates may be presented to individual ones of one or more labelproviders in the depicted embodiment, e.g., in a batched or paged mannerwith some small number of candidates being displayed in order ofdecreasing estimated influence on classifier learning as discussedearlier. Different label providers may submit batches of one or morelabels for the candidates presented to them, and/or other types offeedback such as search queries or other filtering requests, atarbitrary points in time with respect to the start and end times of thetraining iterations in the depicted embodiment. For example, suchfeedback may be received at time f1, f2, f3, f4, between T0 and T1,times f5, f6, f7, f8 and f9 between times T1 and T2, and times f9 andf10 between T2 and T3 in the depicted example scenario. The specifictimes at which feedback is provided may depend, for example, on variousfactors such as how difficult it is to decide on labels of individualdata items that are presented to a given label provider, how muchlabeling assistance (e.g., via highlighting of important attributes ofthe data items) the classification service is able to provide forvarious data items, the potentially differing capabilities or interestlevels of the label providers, how busy the label providers may be withother work, and so on. In at least some embodiments, the labels receivedduring a given training iteration may be collected and used to helptrain the classifiers for the next iteration. Search and other filterrequests may be used to select and/or order candidates for presentationto label providers during the current and/or future training iterationsin at least some embodiments. For example, consider a scenario in which50 labeling feedback candidate data items have been selected and orderedfor presentation to a given label provider in groups of 10 starting attime T1+delta in the example of FIG. 4 . If, after viewing the first 10candidates, the label provider submits a search request or some otherfiltering request, the order in which the remaining 90 items arepresented may be changed, or in some cases a different set of items maybe identified for presentation to the label provider based on thefilter/search feedback and/or on active learning results pertaining tothe filter/search results in at least one embodiment. Note that if anumber of candidate data items has been identified for presentation to aparticular label provider at a given point of time, in at least someembodiments this does not necessarily mean that the particular labelprovider is required to provide labels for all these candidates beforethe next training iteration can begin. The classification service may,for example, decide to initiate the next training iteration even if someof the candidates have not yet been labeled, or may attempt to obtainlabels from multiple label providers for the same data item in someembodiments, and use one of the submitted labels (e.g., a label selectedby the majority of the label providers that submitted labels for a givencandidate data item). In various embodiments, operations illustratedalong both timelines shown in FIG. 4 may collectively be considered partof the training iterations—that is, resources used for computations atthe back end and resources involved in interactions with users may bothcontribute to a given training iteration.

Evolution of Labeling Candidate Selection Criteria

In at least some embodiments, the contributing factors used foridentifying and ranking labeling feedback candidate data items maychange during the training process used for a given classificationproblem. FIG. 5 illustrates an example of changing criteria forselecting labeling feedback candidates over time during classifiertraining, according to at least some embodiments. Along timeline 502,new sets of labeling feedback candidates may be identified at times T0,T1 and T2, which may in some cases correspond approximately to thecompletion/start times of respective training iterations. At time T0, afirst set of candidate selection criteria may be used. Between T0 andT1, however, a number of events that may influence the criteria used forselecting the next set of candidates may occur in the depictedembodiment—e.g., new asynchronous feedback may be received in the formof filters or search requests, new unlabeled data items may be obtainedfrom the data sources being used, new model progress metrics may becollected, and so on. As a result, the selection criteria 520 employedat T1 may differ from those employed at T0. Similarly, between T1 andT2, additional feedback may be received from label providers, datascientists analyzing the progress of the classifiers, and the like, ornew data items may be retrieved, and the criteria 530 used for selectingthe next set of candidates may differ from those used at T1 (and/orthose used at TO). In at least one embodiment, the classificationservice may adjust its criteria as soon as any new feedback or data isobtained—that is, a policy of continuous adjustment of selectioncriteria for labeling feedback may be employed, enabling even fasterattainment of training sets that result in high-quality classifiers.

Adjustable Automation Levels

FIG. 6 illustrates example modes of classifier training with respect tothe extent to which decisions are made in an automated manner by theclassification service, according to at least some embodiments. In thedepicted embodiment, a client or user of the classification service(which may have features similar to those discussed in the context ofFIG. 1 ) may use an interactive mode similar to a sliding scale 602 tospecify the level of automation desired with respect to variousdecisions made during the training of one or more classifiers. At oneextreme, a fully-automated mode 620 may be selected, in which theclassification service's back-end subsystems make most of the decisions,such as exactly which types of models are to be used for classifiers,the settings for various hyper-parameters of the training process, thetechniques/algorithms to be used to select candidates for labelingfeedback, the set of metric thresholds and/or diagnosis tests used toterminate training, the set of training enhancement actions (if any) tobe undertaken at various stages of the process, and so on. At the otherextreme, a user or client skilled in machine learning may decide toprovide input on many such decisions, or at least view and have a chanceto approve/disapprove the decisions recommended, and may therefore optto use the classification service in manual mode 610. In at least oneembodiment, one or more intermediate modes 630 may also be selectable,e.g., with an interface element that can be used to determine whichtypes of decisions are automated and which require or use client input.In effect, in the depicted embodiment, an indication may be obtained viaan interactive interface of the level of automation to be implemented atone or more stages of a classification workflow, and depending on thedesired level of automation, parameters for making various decisions(such as when to terminate the classification training) may be made bythe service without necessarily requiring input or guidance from a user.

In some embodiments, a more granular control over automation levels maybe provided, in which clients are shown a list of decision types andallowed to choose a subset of decision types on which they wish toprovide input. In one embodiment, interface elements other than slidingscales may be implemented to enable users to decide the levels ofautomation—e.g., radio knob style interfaces may be implemented, orrespective automation on/off checkboxes corresponding to the differenttypes of decisions may be implemented. In various embodiments, using thekinds of automation adjustment interfaces shown in FIG. 6 , the needsand capabilities of a wide variety of users may be accommodated by theclassification service, including for example machine learning experts,subject matter domain experts, as well as individuals who haverelatively little experience with machine learning or classification. Inat least some embodiments, the level of automation may be changed duringthe course of training a given classifier using an interface similar tothat shown in FIG. 6 —e.g., early on during the training, a lower levelof automation may be used, and as the training progresses more and moreof the decision making responsibilities may be handed over to theclassification service by increasing the level of automation.

Example Interactive Interface

A number of different types of programmatic interfaces may be used forinteractions between clients or users and a classification service ofthe kind discussed above in various embodiments. FIG. 7 illustrates anoverview of an example interactive interface which may be used todisplay labeling feedback candidates and obtain labels to be used fortraining classifiers, according to at least some embodiments. In thedepicted embodiment, interactive interface 701, which may for example bepresented as a web page or as a graphical user interface (GUI), maycomprise a scrollable ordered feedback candidate data item region 712,as well as numerous other panels or regions indicating various aspectsof the classification development workflow. Various portions of thecontent displayed via the interactive interface 701 may be generated aspart of a visualization data set by the classification service andtransmitted to a client-side device for presentation to the client insome embodiments.

Within the scrollable region 712, information about a number ofcandidate items 714, such as items 714A or 714B for which labelingfeedback may be provided by the viewer if desired, may be shown in thedepicted embodiment. In the example scenario shown in FIG. 7 , for eachof the items, an item image 716 (e.g., 716A or 716B) may be showntowards the left of the display, a central region may comprise the itemtitle 719 (e.g., 719A or 719B) and description details 717 (e.g., 717Aor 717B), and a set of additional item attributes 718 (e.g., 718A or718B) may be presented at the right. In at least some embodiments,audio/video recordings or segments 777 providing additional informationabout the data item may be included. Generally speaking, the specificdata types and formats of the information pertaining to various dataitems included in the views presented via an interactive interface 701may vary, e.g., among the different data items for a givenclassification problem and/or from one classification problem toanother. An interface element 715 (e.g. 715A or 715B) may be used toprovide a label for the item in the depicted embodiment. The order inwhich the data items are arranged in region 712 may, for example bebased at least in part on a respective estimated rank, with respect to ametric such as an estimated impact on learning, associated withincluding individual ones of the data items in a training set for one ormore training iterations of one or more classification models in thedepicted embodiment. In various embodiments, in effect, the itemsdisplayed may be selected and presented in an order based on the extentto which they may contribute to faster learning and convergence of theclassifier being generated, with those items that are estimated toprovide greater benefits towards learning being presented before itemsthat are expected to provide smaller benefits. In the scenario depictedin FIG. 7 , for example, item 714A may be presented before item 714Bunder the assumption that the positive impact or learning contribution,with respect to the quality of the classifier obtained in one or moresubsequent training iterations, of providing a label for item 714A mayexceed (or at least be no smaller than) the positive impact of providinga label for item 714B. To help the user make label selection decisions,in at least some embodiment's terms that are correlated with membershipin a given class may be highlighted in the item title 719, descriptiondetails 717 and/or summarized attributes 718. Examples of suchhighlighting techniques are provided below.

In addition to the candidate items themselves, the interactive interface701 may include a title 702 identifying the current classification task,a summary 741 of a set of diagnosis tests that may be used to helpdecide whether to terminate training of the classifier, a label filterelement 704, a current class distribution ribbon element 706, a searchterm entry form element 708, a search result element 722, an updatetimestamp indicator element 724 a label-all option element 726 and/or asubmit interface element 728 in the depicted embodiment. Note that atleast some of the interface elements and regions shown in FIG. 7 may notbe required in some embodiments, or may be arranged in a differentlayout than that shown.

The diagnosis summary 741 may indicate how many (or what fraction of) aselected set of diagnosis tests have been met in the most recenttraining iteration. The label filter 704 may be used to indicate to theclassification service whether items that are currently being predictedas being members of a particular class (or have earlier been labeled asmembers of a particular class) should be presented to the user next. Thecurrent class distribution ribbon may provide a visualization of themanner in which the data items for which predictions have been generatedare distributed among the set of classes, where the class boundarieslie, and so on. The entry form 708 may be used to enter searchpredicates or queries that are to be used to filter data items forpresentation to the client. The number of items that were identified inresponse to the previously-submitted search, as well as the breakdown ofthose items among the classes may be presented in element 722. Atimestamp indicating the time of completion of the last trainingiteration (or the most recent time at which the user has submittedfeedback) may be indicated in element 724, together with interfaceelements to view more detailed historical information with respect tothe training iterations. The label-all option element 726 may be used tolabel all the items currently selected (or currently being displayed)with a particular label, e.g., instead of the user having to select thesame label for each item separately. The submit interface element 728may be used, as indicated by its name, to submit the set of labels thatthe user has currently indicated for the individual items beingdisplayed, to the classification service back end. Additional detailsregarding various aspects of these elements are provided below.

Attribute Highlighting

FIG. 8 illustrates examples of the use of highlighting to distinguishterms or tokens within labeling feedback candidates displayed via aninteractive interface, according to at least some embodiments. In theexample scenario depicted in FIG. 8 , a binary classification model (forclasses A and B) is being developed, and the example data items beingused to train and test the binary classifier comprise a plurality oftext token sets in addition to an item image 816. A token set maycomprise some number of text tokens (such as words, punctuation, and thelike) in the depicted embodiment. In the example scenario, the user hassubmitted a search query, and information about a particular item whichis displayed via the interactive interface after the search query hasbeen submitted is shown. As indicated in the search entry form box 808,the search predicate submitted by the user comprises a token set“tokenSet3” in the depicted example. Thus, the user has indicated thatresults for a search for items which comprise tokenSet3 in one or moreattributes (or some set of tokens similar to tokenSet3) should bedisplayed, if such items are found by the classification service.

Among the items displayed to the user via the interactive interface, atleast one item 814 which includes the searched-for token set tokenSet3,and which is not yet labeled, may be included in the depictedembodiment. In order to help the user provide a label for item 814,token sets 810 whose presence in an item's attributes have a highcorrelation with membership of the item within class A (as determinedusing the versions of the classifier that have been trained thus far)may be highlighted in a particular color C1, and token sets 811 whosepresence in an item's attributes have a high correlation with membershipof the item within class B may be highlighted in a different color C2.Furthermore, the occurrences of the searched-for terms 812 (tokenSet3 inthe example shown) may be highlighted using a third color C3 in thedepicted embodiment, as indicated in legend 890. The item title 819 andthe description details section 817 includes tokenSet2, which iscorrelated with class A membership, and the additional summarizedattributes section 818 includes tokenSet57, which is also correlatedwith class A membership. TokenSet54, which is correlated with class Bmembership and present in the description details section, ishighlighted in color C2. TokenSet3 in the title 819 and the descriptiondetails section 817, may be highlighted in a third color C3 to indicatethat it corresponds to the searched-for terms. In some embodiments,relevant non-text portions of the information may also or instead behighlighted—e.g., as shown, portion 870 of image 816 may be highlightedto indicate a correlation of that portion of the image with class A.Similarly, in scenarios in which audio or video information about dataitems is presented, portions of the audio recording/segment or videowhich are correlated with target classes or search terms may behighlighted in at least one embodiment.

In addition to using highlighting and/or different colors for attributeelements correlated with class membership, in at least some embodiments,other types of visual signals may also be displayed in the depictedembodiment. For example, if the user selects the class A for the label822 (or if the item being displayed was previously labeled as a class Amember), the item background color 815 may be set to a light shade ofcolor C1 in the depicted scenario, while if the user selects the classB, the item background color 815 may be set to a light shade of colorC2. As a result of these and similar visual cues, users may be able tograsp various characteristics of the displayed candidates at aglance—e.g., whether the item's attributes (such as the title 819, thedetails 817 or the additional summarized attributes 818) are dominatedby terms correlated with a particular class, how frequently thesearched-for terms occur in the items, and which (if any) of the itemsbeing displayed have been labeled already as members of one class oranother. Such visual signals may help to substantially simplify the workof label providers, classifier task managers (e.g., individuals whoinitiated the classification workflow), individuals responsible fordebugging/analyzing the classification workflow, and so on in variousembodiments. Various other types of visual cues (e.g., the use ofdifferent fonts in addition to or instead of different colors, varyingintensities of colors to indicate the extent of correlation or matchingwith search terms) and/or other modes of cues (such as audio cues ortones representing the extent of correlation with the different classeswhen a particular token is hovered over with a mouse or other interface)may be used to provide similar types of information in variousembodiments to help various types of users. Note that although visualcues have been indicated for binary classification by way of example inFIG. 8 , similar techniques may be applied with equal success formulti-class classification problems in various embodiments.

Label Reconsideration Requests

In at least one embodiment, the classification service may generateclass predictions for at least some data items for which labels havealready been provided, e.g., in order to determine the extent to whichthe classifier differs in its conclusions from the label providers. FIG.9 illustrates an example scenario in which a label provider may berequested, via an interactive interface, to reconsider whether apreviously-provided label is appropriate for a labeling feedbackcandidate, according to at least some embodiments. In the depictedexample, a data item 914 with item image 916 and title 919 has beenprovided a label corresponding to class B of a binary classificationproblem (where the other class is class A), as indicated in element 922.The background color 915 may therefore be a light shade of the color C2corresponding to class B.

Although the item 914 is currently designated as a member of class B, anumber of token sets 910 that are highly correlated with class A (andare therefore highlighted using color C1) may have been identified bythe classification service based at least in part on analysis performedusing one or more versions of the classifiers trained thus far, and notoken sets that are highly correlated with class B may have been foundin the depicted scenario. For example, tokenSet2 in the item title 919and in the details section 917, tokenSet9 and tokenSet54 in the detailssection 917, and tokenSet37 and tokenSet57 in the summarized attributessection 918 may all be highly correlated with class A. membership.Furthermore, in the depicted embodiment, the classification service mayhave computed a high predicted score (e.g., 0.8) for class A membershipfor the item 914. A suggestion or request 920 for the user to reconsiderthe previously provided Class B label 922 may be included in thepresented visualization data set in the depicted embodiment. In someembodiments, a prediction score which indicates that the currentuser-suggested label is incorrect may be indicated in thereconsideration request, while in other embodiments such a score may notnecessarily be displayed. Other types of cues, such as a backgroundcolor for the item information which suggests that the current label maypotentially be inaccurate, may be used as indicators of reconsiderationrequest in some embodiments. Note that at least in one embodiment, therequest to reconsider a previously-supplied label may be sent to adifferent individual/user than the source of the previously-suppliedlabel—e.g., to one of a set of trusted individuals who are permitted tochange previously-provided labels.

User-Defined Labels and Recommended Searches

FIG. 10 illustrates examples of interface elements that may be used toindicate user-defined labels and recommended token sets for searches,according to at least some embodiments. The classes to which data itemsare to be eventually assigned (e.g., Class A and Class B in the binaryclassification examples shown in FIG. 8 and FIG. 9 ), which may beselected by the initiator or manager of the classification workflow, maybe referred to as the target classes in some embodiments. In variousembodiments, as a label provider or other user examines a set oflabeling feedback candidate data items via an interactive interface ofthe classification service, they may notice similarities among a subsetof the items, while still being unable to definitively decide onassigning the similar items to a particular target class of theclassifier being developed. To help keep track of the similaritiesidentified among such data items, user-defined labels may be created andstored, at least temporarily, for the items in various embodiments.

Such user-defined labels may be used as filter or search predicates inat least one embodiment. For example, when submitting a filteringrequest to the classification service for the set of data items to bepresented next, a user may use a label filter 1004 of the type shown inFIG. 10 . A drop-down menu of the currently defined/assigned labels, aswell as an interface element which can be used to add a new user-definedlabel, may be presented in response to a programmatic interaction suchas a mouse click within the label filter interface element 1004. The setof label filtering options presented may include the target classes suchas Class A, Class B and Class C, the “Unlabeled” category (which may beused in the depicted embodiment for those data items which have not yetbeen designated as members of a particular class, either by labelproviders or by the classification service), as well as zero or moreuser-defined labels such as U1 and U2 in the depicted example scenario.Interface elements such as checkboxes 1010 may be provided to enable auser to select the set of labels/categories to be used to filter thenext set of data items presented to the user in various embodiments.

In some embodiments, if and when a user eventually decides that all thedata items that were assigned a particular user-defined label should bedesignated as members of a particular target class, a label filter 1004may be used to retrieve all the items to which the user-defined labelwas assigned, and an interface element similar to the “label-all option”shown in FIG. 7 may be used to assign the data items in bulk to theparticular target class. Such an approach may enable the label providersto avoid labeling such data items one at a time, thereby furtherenhancing the user experience of the label providers. In at least someembodiments, if a label provider decides that all the items with auser-defined label are to be assigned to a target class, the metadatastored at the classification service regarding the user-defined labelmay optionally be deleted—that is, information about user-defined labelsmay only be retained/stored for periods during which a decision aboutthe target class of the items assigned the user-defined label has notyet been made. Such an approach may help to reduce the memory andstorage resources required at the classification service in at leastsome embodiments, while still enabling users to take advantage of theuser-defined label feature. In effect, in various embodiments,user-defined labels may serve as the equivalent of customizableannotations of various label providers, which can help them performtheir tasks in a more streamlined manner. For example, consider ascenario in which a binary classification task comprises labelingwhether an individual whose medical records are being examined suffersfrom a particular disease or not. While viewing data items that showdemographic information, results of various medical tests, reports ofsymptoms and the like for various individuals, a label provider maynotice that several of the individuals have a combination of aparticular age range and a particular set of symptoms—e.g., theindividuals are all between 40 and 50 years old and all exhibit aparticular symptom S1. In such a scenario, a user-defined label “between40-50-withSymptomS” may be defined and used as described above as acustom annotation for such individuals' data items. Later, based on thedecision reached by the label providers, all the individuals to whom theuser-defined label was assigned may potentially be labeled in a singleinteraction with one of the target class labels if desired.

In the embodiment depicted in FIG. 10 , a text box interface for asearch term filter 1024 may be implemented. The search predicatestransmitted to the classification service may be decided, for example,based on the combination of text entered in the text box, a set ofrecommended influential terms or token sets identified by theclassification service, an auto-complete feature, and the like in thedepicted embodiment. During various training iterations, theclassification service may identify a set of tokens or terms whosepresence in a data item is highly correlated with difficulty ofclassifying the item, and such token sets may be included as recommendedsearch terms (e.g., recommended search terms #1, #2 and #3 in theexample scenario shown in FIG. 10 ). If the user types in the firstletter or first few letters of such a recommended search term, orletters that are similar to the recommended search terms, therecommended search terms may be shown as an option for the search termfilter, thereby enabling the user to potentially reduce the among oftext that has to be entered in the depicted embodiment to identify itemsfor which providing labels would be most beneficial. In someembodiments, the user may not even have to enter any text before one ormore recommended search terms are presented as options—e.g., such termsmay be presented via a drop-down menu element as soon as the user clicksin a search term entry box. As the back-end of the classificationservice learns more about the classification task being performed, theuse of the recommended searches may help simplify the task of the labelproviders by helping them to focus on the more important (from theperspective of improving the quality of the classifier) data items. Thepresentation of the recommended search terms may also help the user(e.g., a label provider or a data scientist debugging the classifier)get a better sense of what the classification service back-end hasalready learned, in the training iterations that have been completedthus far, regarding the classification problem being addressed invarious embodiments. For example, by choosing a recommended search termand viewing the data items that include the recommended search term, adata scientist debugging the classifier may determine the kinds ofattributes that cause classification scores of data items to bepredicted close to the current class boundaries in some embodiments.

In at least some embodiments, recommended search terms may notnecessarily be used to help users view difficult-to-classify dataitems—instead, for example, some recommended search terms may be highlycorrelated with class memberships, thereby enabling a user to viewcharacteristics of data items that make the data items more easilyclassified. In one embodiment, a user may be able to specify, via aprogrammatic interface, the kinds of recommended search terms to bepresented—e.g., whether recommendations for terms that lead to displayof difficult-to-classify items should be provided, recommendations forterms that lead to display of easy-to-classify items should be provided,or both. Generally speaking, in various embodiments, recommended searchterms or predicates may be identified based on the analysis of one ormore metrics. For example, in one embodiment the variance of predictedclass scores from an active learning committee of classifiers amongitems that contain a given search term may be used to identify therecommended search terms, in which case higher variance may representgreater difficulty of classification, and lower variance may representgreater ease of classification.

Class Distribution Information

FIG. 11 illustrates an example interactive interface element fordisplaying class distribution information, according to at least someembodiments. In effect, in the depicted embodiment, a three-dimensionalset of data pertaining to the distribution of predictions obtainedduring a particular set of one or more training iterations (such as themost recently completed training iteration) for a binary classifier maybe provided using a combination of a ribbon-like visualization and thevariation of color hues and intensities. The long edge of the ribbon (along rectangular shape which may, for example, be positioned within aweb-based interactive interface similar to that shown in FIG. 7 ) maycomprise a binary class prediction score value axis numbered from 0 to100 (with the 0 and 100 values being implied instead of explicitlydisplayed, as they can be inferred from the 10, 20, . . . , 90 values).If the two binary classes being considered are “positive” and“negative”, for example, a score closer to 100 for a given data item mayrepresent a higher probability that a data item is predicted as beingpart of the positive class in the depicted embodiment, while a scorecloser to 0 for a given data item may represent a higher probabilitythat the data item is predicted as part of the negative class.

Small colored rectangles 1140 and 1142 may be placed at variouspositions within the ribbon 1104 in the depicted embodiment. Theposition along the 0-100 scale of a colored rectangle may indicate thatsome number or fraction of data items have been assigned thecorresponding prediction scores—e.g., the three rectangles 1140 at theleft may indicate that some number or fraction of data items have beenassigned scores strongly indicative of the negative class, while the twocolored rectangles 1142 between 70 and 100 may indicate that some numberof data items have been assigned scores indicative of the positiveclass. Different colors may be used for positive and negative classes:e.g., red may be used for negative class data items, while green may beused for positive class data items in the depicted embodiments. Toconvey information about the relative density of the distribution withrespect to each colored rectangle, in at least some embodiments thelightness/darkness or intensity of the hue may be used—e.g., alighter/weaker shade of green or red may be used to indicate a smallernumber of data items, while a darker/stronger shade of green or red maybe used to indicate a larger number of data items have been assigned thecorresponding range of scores. If, in one trivial example scenario, onlya single intense green rectangle were shown at the “0” end of the scale,and only a single intense red rectangle were shown at the “100” end inthe depicted embodiment, this would imply that as of the trainingiteration whose results are being represented via the ribbon, a largenumber of data items are predicted as “extremely negative”, and a largenumber of data items are predicted as “extremely positive”, with few orno data items in between. If, in contrast, colored rectangles of uniformcolor intensity occupied the entire ribbon, with no empty regions, thismay indicate a fairly uniform distribution of scores between 0 and 100,indicating that per the current version of the classifier, the dataitems for which predictions have been generated are uniformlydistributed among extremely positive, moderately positive, borderline,moderately negative, and extremely negative portions of the binaryclassification spectrum in the depicted embodiment.

A moveable and/or otherwise adjustable (e.g., expandable) zoom-inrequest interface element 1108 may be provided in the depictedembodiment, e.g., to enable users who are interested infiner-granularity details of the class distribution to view anotherribbon showing details of the portion of the ribbon covered by thezoom-in request. Thus, for example, if a user clicks on the zoom-inrequest element as it is positioned in FIG. 11 , the portion of thescore range between approximately 62 and 76 may be shown in finergranularity.

Class Boundary Information

In some embodiments, a class distribution ribbon similar to thatdiscussed above may also provide information about class boundaries,which may, at least in the case of binary classification, in turnindicate the fraction of data items that have not yet been classified toa desired confidence level. FIG. 12 illustrates example interactiveinterface elements that indicate the fraction of training observationswhose class has not yet been determined, according to at least someembodiments. As in the case of FIG. 11 , a classifier for a binaryclassification problem is assumed to be under development in two examplescenarios shown in FIG. 12 . For each scenario, a respective classdistribution ribbon (1204A or 1204B) is shown, with a pair of classboundary indicators. The left indicator 1206A or 1206B may indicate therange of predicted classification scores that are classified as negativewith a selected confidence level such as 95%, while the right indicator1208A or 1208B may indicate the range of predicted classification scoresthat are classified as positive with the selected confidence level.

The distance between the left and right indicators may provide (at leastan approximate) an indication of the fraction of data items for which aclassification prediction with a targeted confidence level has not yetbeen generated in the depicted embodiment. Thus, because the classboundary indicators are further apart in ribbon 1204B than in ribbon1204A, a greater fraction of data items may remain to-be-classified inthe scenario depicted using ribbon 1204B than in the scenario depictedin ribbon 1204A. Note that, at least in some cases, there may be somelabeled data items (such as items 1209) which lie between the currentclass boundary indicators, indicating for example that while such itemshave been assigned predicted class scores, the items have not yet beenplaced in one of the target classes with the desired confidence levels.The class boundary indicators may thus provide a simple representationof the certainty or confidence levels of the current state of theclassifier in at least some embodiments.

Summarized Metrics Status and Historical Trends Visualization

FIG. 13 illustrates an example interactive interface element thatprovides summarized information about a set of status indicators,according to at least some embodiments. Such an element may be includedas part of web-based or other graphical interactive interface of aclassification service in various embodiments, as also indicated in FIG.7 . An indication of the last update to the classifier (e.g., when thetraining of the iteration-final classifier of the most recent trainingiteration was completed) may be provided in an update timestamp/historyinformation element 1302 of the interface in the depicted embodiment. Ahistory request button 1304 may be used to submit a request forhistorical information pertaining to one or more classification modelmetrics; examples of the kinds of historical information that may bepresented in some embodiments are provided below, e.g., in the contextof FIG. 14 .

In at least one embodiment, a high-level summary 1306 of selected modelmetrics may be provided, e.g., in response to a mouse click on theupdate timestamp and history information element 1302. Metrics summariesfor a binary classification problem are shown by way of example in FIG.13 . The summary 1306 may, for example, indicate the total number oflabels (approximately 3.3K or 3300 in the depicted scenario) that havebeen obtained at the classification service for the classification taskbeing addressed, and provide a breakdown for the different classes (986negative and approximately 2.3K or 2300 negative). PPV and NPV valueswith confidence intervals may be provided in some embodiments in thesummary, as well as information about the current coverage level (e.g.,the number of items that have been classified with the desiredconfidence levels, as well as the fraction that are as yetunclassified). In at least some embodiments, users may add new metricsto the list of metrics for which summarized and/or trend information isto be provided, or remove existing metrics, e.g., using the “add/removemetric” element 1308 in the depicted embodiment. Any of wide variety ofmetrics may be added/removed in different embodiments, depending on theproblem domain and the type of classification (e.g., binary vs.multi-class) being attempted, such as for example PPV, NPV, accuracy,prevalence, precision, false discovery rate, false omission rate,recall, sensitivity, diagnostic odds ratio, F1 score or the like.

FIG. 14 illustrates an example interactive interface element thatprovides historical information about a set of status indicators,according to at least some embodiments. A metrics history panel 1402 maybe used to present a visualization data set associated with a set ofmetrics obtained from the training iterations that have been conductedthus far at the classification service with respect to a givenclassification problem in the depicted embodiment. The panel 1402 may,comprise a temporal axis 1404, which may for example show integeriteration identifiers and/or respective timestamps at which varioustraining iterations were completed. Corresponding to individual ones ofthe iterations, values of a plurality of metrics may be displayed usingrespective sub-panels or graph display regions, such as PPV sub-panel1408, NPV sub-panel 1411, coverage sub-panel 1417, and/or cumulativelabels sub-panel 1419 in various embodiments. The particular metrics forwhich historical trend information is to be displayed may be indicatedprogrammatically by authorized users or clients of the classificationservice in various embodiments. The presentation of the metrics valuesin a vertically stacked manner similar to that shown in FIG. 14 may beextremely helpful to analysts in that it may be easy to view the valuesof all the different metrics as of any given iteration or time, asindicated by vertical line 1490 in the depicted embodiment correspondingto iteration 33.

Historical trend information of the kind shown in FIG. 14 may be usefulto data scientists interested in analyzing the progress of classifierdevelopment in various embodiments, as it may make it much easier tograsp how close the models are to convergence, to understand why somemetrics have not yet met target thresholds, and at least in some casesto determine the kinds of guidance that should be provided (if any) tothe classification service to enhance or accelerate future trainingiterations. As new data items are added to the training set, especiallyduring early iterations of the overall training process, the mix of dataitem characteristics represented in the training sets may in some caseschange substantially, which may potentially cause some metrics tofluctuate dramatically. For example, the line AA′ in sub-panel 1408shows substantial changes in PPV during a recent set of trainingiterations, which may indicate that a number of additional trainingiterations may be needed before PPV targets are met in the depictedexample scenario.

With respect to at least some metrics, confidence bounds or intervals1410 may also be indicated in the metrics history sub-panels in variousembodiments. For example, with respect to NPV, the mean value overrecent iterations is shown by line BB′ in sub-panel 1411, while upperand lower confidence bounds (for some selected confidence interval suchas 95%) may be indicated via lines CC′ and DD′ respectively. Theconfidence intervals for which such lines are to be displayed, as wellas whether such lines should be displayed at all for a particularmetric, may be selectable based on user-provided input in someembodiments.

A coverage sub-panel 1417 (shown for a binary classification problem inFIG. 14 by way of example) may be used in some embodiments to indicatethe relative fractions of data items that are predicted to be members ofdifferent classes in various iterations, and may also be used toindicate the fraction of items that have not been classified with atargeted confidence level in the different iterations. For example, insub-panel 1417, the distance between line EE′ and the top of the panelin region 1412 may represent the fraction of items predicted (with atarget confidence level) as being members of a positive class in a giveniteration, the distance between line FF′ and the bottom of the sub-panelin region 1414 may represent the fraction predicted as being members ofthe negative class, and the distances between lines EE′ and FF′ in theunclassified region 1413 may represent the fraction that are not yetclassified in the various iterations for which the visualization isbeing provided. In sub-panel 1419, the cumulative numbers of positivelabels 1491 (represented by the vertical distance between line GG′ andJr) collected from label providers with respect to various iterations,as well as the cumulative number of negative labels 1492 (represented bythe vertical distance between line JJ′ and HH′) collected from labelproviders with respect to various iterations may be shown, indicatingthe growing size of the training set as the iterations proceed in thedepicted embodiment. Metric-specific axes 1406 may be used to indicatethe exact values for the different metrics in various embodiments. In atleast some embodiments, interactive interface elements may be providedto enable users to view details for specific iterations, time rangesand/or iteration-count ranges, e.g., using zoom-in controls similar toslider 1439 shown in FIG. 14 .

Diagnosis Tests for Training Progress and Termination

As indicated above, in at least some embodiments users may select and/ordefine a set of metrics for which respective values may be collectedover classifier iterations, and for which visualization data sets may beprepared and presented to users. In some embodiments, a set of diagnosistests may also be defined based on the status or trends in variousmetric values, which may be used as respective binary decisionindicators to help determine whether/when to terminate training (i.e.,to stop executing additional training iterations). FIG. 15 illustratesexample interactive interface elements that provide information about aset of selected diagnosis tests pertaining to classifier trainingcompletion, according to at least some embodiments. The number anddefinitions of the tests may be determined, for example, by theclassification service based on default settings for the type ofclassification problem being addressed, and/or based on programmaticuser input. A summary indicating the number of diagnosis tests that havebeen passed with respect to the most recent training iteration may beprovided via a diagnosis summary box 1502 in some embodiments, which maybe presented as part of an interactive web-based or graphical userinterface of the kind discussed in the context of FIG. 7 .

In response to user input (e.g., when a user clicks within the diagnosissummary box 1502), a diagnosis details list 1504 may be presented via aninteractive interface in at least some embodiments. A list of the namesof the diagnosis tests being used and their current values, togetherwith or more status indicator symbols, may be provided in at least oneembodiment—e.g., a checkmark next to a diagnosis test name may indicatethat the results of the test meet a targeted threshold criterion, an “X”symbol next to a diagnosis test name may indicate that the test'sresults are unsatisfactory, and so on. Symbols such as the “i” symbolshown next to the PPV Trend test entry may, for example, indicate thatthe corresponding diagnosis test has not yet designated as passed(satisfactory) or failed (unsatisfactory), so more analysis may berequired. Other symbols may be used to indicate, for example, thatadditional information (that may not yet have been viewed) may beavailable pertaining to a given diagnosis test in some embodiments.Users may, in some embodiments, add new diagnosis tests to the list(e.g., using the “Add new diagnosis test” interface element 1592). In atleast one embodiment, users may designate some diagnosis tests asmandatory. Similarly, one or more diagnosis tests may be designated asoptional in some embodiments.

In some embodiments, in order to provide guidance regarding the specifictypes of steps that should be taken to enhance or improve subsequenttraining, interface controls (such as the “prioritize” control elements1594) may be used. For example, if a given diagnosis test is consideredimportant by a user such as a data scientist and is far from passing,the user may click on a prioritize element 1594 to signal to theclassification service to emphasize improvement of the results of thattest, to the extent possible, in future training iterations in someembodiments. Note that such controls may not necessarily be provided orsupported for some types of diagnosis tests, as it may hard to takespecific actions focused primarily on such tests. The kinds of trainingenhancement actions that may be initiated for a particular diagnosistest may include, for example, selecting particular kinds of labelfeedback candidate data items, expanding the pool of label providers whohave been effective at quickly providing labels for a particular class,and so on in various embodiments. In some embodiments prioritizationcontrols of the kind indicated in FIG. 15 may not be implemented. In atleast some embodiments, if a particular diagnosis test has not beenpassed, this may not necessarily imply that the training of theclassifier has to continue—e.g., users may override the requirement thatall diagnosis tests have to pass before training iterations are ended,and/or a resource limit associated with the training may be exhaustedprior to the satisfactory completion of all the initially-identifieddiagnosis tests.

Interactive Programmatic Interfaces with Role Based Tabs

In at least some embodiments, a set of interactive programmaticinterfaces which is organized using tabs into portions corresponding (atleast approximately) to different stages of classifier developmentand/or to different user roles may be implemented. FIG. 16 illustratesaspects of an example configuration setup tab of an interactiveinterface for training classifiers, according to at least someembodiments. As shown, tabbed interactive interface 1601 (which may beimplemented, for example via a set of web pages and/or a graphical userinterface) may indicate the name of the interface 1602 (e.g.,“Classifier Management Tool”), one or more interface elements to createnew classifier instances (e.g., the “Create new” element 1604) and/or tosearch for classifier instances (e.g., via search entry element 1606), acompletion status element 1608, and a set 1610 of tabbed sub-panels inthe depicted embodiment, including a Setup tab, a Range Classes tab, aTeach tab, an Evaluate tab and a Conclude Tab. In the example viewdepicted in FIG. 16 , the Setup tab is selected as indicated by thethicker outline of the “Setup” name relative to the other tab names. Thecompletion status element 1608 may, for example, be used to provide anoverall summary of the status of the classification workflow that isunderway in the depicted embodiment, and may in some embodiments bevisible from multiple tabs (i.e., when any of several tabs happens to beselected).

The Setup tab interface controls may be used, as suggested by the name,to initiate a classifier workflow in various embodiments. The classifieridentity section 1612 may, for example, be used to provide theclassifier type (e.g., binary vs. multi-class), a name for theclassifier, and/or a description of the classification effort (e.g.,goals or objectives of the classifier, etc.) in the depicted embodiment.In at least some embodiments, several different versions of a classifiermay be saved (and/or used in production environments) over time, and theversion section 1614 may be used to indicate the reason for a newversion being set up, the differences between the newer version andolder versions, and so on. A class definition section 1616 may be usedto provide information about the target classes into which data itemsare to be categorized in the depicted embodiment, and/or the manner inwhich data items from the data sources are to be selected to form theinput data for training the classifier. For example, the names of targetclasses, the names and values of data item attributes to be used toselect data items from data sources as part of the input data for theclassification effort (which may be specified via the “includeattributes” element) and/or the names of attributes and values of dataitem attributes to be used to exclude data items from the data sourceswhen collecting input data for the classification effort (which may bespecified via the “exclude attributes” element) may be indicated via theclass definition section in at least some embodiments. The “include”and/or “exclude” attribute information provided by an entity responsiblefor setting up a classifier development workflow may, for example, beused by elements of the retrieval subsystem discussed earlier, e.g., inthe context of FIG. 7 , to extract subsets of relevant data from datasources.

FIG. 17 illustrates additional aspects of an example configuration setuptab of an interactive interface for training classifiers, according toat least some embodiments. The example view shown in FIG. 17 may, forexample, be obtained by scrolling down starting from the view depictedin FIG. 17 . Tab set 1610 still shows that the Setup tab is selected, asin FIG. 16 . A portion of the class definition section 1616 is shown inFIG. 17 to indicate that the session control section 1712 may also beincluded as part of the same scrollable web page or graphical userinterface view as the class definition section in at least someembodiments.

The session control section 1712 may be used to configure and manageinteractive labeling sessions, which may also be referred to as“teaching sessions” in the depicted embodiment. For a given teachingsession, the example interface depicted in FIG. 17 may indicate thestarting time, the status of the session as a whole (e.g., whether thesession is ongoing, suspended, etc.), as well as target training metricvalues pertaining to the session (minimum PPV, minimum NPV etc.).Interface elements may be provided to resume the session if it issuspended, to start new sessions, to add metrics, and so on in variousembodiments. In the depicted embodiment, the session control section1712 may include controls regarding importing, exporting and/or deletinglabels. For example, if a given collection of data items is beingre-used for a new classifier or a new version pf a previously-createdclassifier, labels may be extracted or imported from a data store insome embodiments, e.g., using the “Import” interface control shown inthe session control section. Labels obtained during the session may beexported or saved (e.g., using the “Export” interface element) to a datastore, from where they may later be extracted. Labels generated in agiven session may also be deleted/cleaned (e.g., using the“Clean/delete” interface element) in various embodiments if desired. Inat least some embodiments, users may be able to easily obtainexplanatory information about some or all of the terms used in theinteractive interface—e.g., by hovering the mouse over the “Minimum NPV”element of session control section 1712, an explanation of the NPVmetric and/or why it may be beneficial to specify the metric may beprovided. Similar explanatory interface elements may be implemented forthe un-tabbed interfaces discussed earlier (e.g., the interface shown inFIG. 7 and its components) in various embodiments.

Example Class Range Tab View

FIG. 18 illustrates aspects of an example class range definition tab ofan interactive interface for training classifiers, according to at leastsome embodiments. In the depicted embodiment, a “Range Classes” tab oftab set 1610 may be selected, and used for example to indicate thedefinitions 1806 of various target classes and/or justifications fordesignating data items as members of the various target classes. Thenames 1804 of the target classes may have been indicated via the classdefinition portions of the setup tab view, as indicated above, in atleast some embodiments. The portions of the interface shown on FIG. 18may be used to provide information about the meaning or semantics of theclasses, an authority 1808 or reference on whose basis a given class isbeing defined (for example, a regulation or law which may be used toclassify items of an inventory), a set of external resources 1810 (suchas web sites) that may be used to find out more information about theclass, and so on. In at least some embodiments, a set of one or morejustifying attributes may be entered, e.g., in a “justification table”section 1812 of the interface, to provide at least some level ofreasoning why a given data item should be designated as a member of aparticular class. Such justification information may, for example, beused to perform an initial rough classification effort, e.g., to obtaina small set of training data that can be used to start trainingiterations for the classifier being developed in some embodiments.

Example Labeling Feedback Tab View

FIG. 19 illustrates aspects of an example labeling feedback tab of aninteractive interface for training classifiers, according to at leastsome embodiments. The “Teach” tab may have been selected from the tabset 1610 to reach the labeling feedback section of the interface in thedepicted embodiment top reach a portion of the interface usable by labelproviders (and/or other types of users). Manual, automatic orintermediary levels of automation may be selected for the labelinginteractions in the depicted embodiment, e.g., using the “teachingstrategy” interface element 1912. Filter criteria section 1942 may beused by a label provider to indicate search terms (for which recommendedsearch predicates may be generated and presented as discussed earlier),classification service-generated class predictions (corresponding to the“machine classification” interface options) and/or labeler-providedlabels (corresponding to the “human classification” interface options)in the depicted embodiment. For example, if a user wants to view dataitems that have been classified by the classification service as membersof class “A”, while being labeled as class “B” members by users, themachine classification option for class A may be selected, and the humanclassification option for class B may be selected using filter criteriasection 1942 in the depicted embodiment. In addition, in the depictedembodiment, data items may be filtered using service-providejustifications (e.g., by entering text into the “Machine justification”text block of section 1942) and/or labeler-provided justifications(e.g., by entering text into the “Human justification” text block).Element 1944 of the Teach tab may indicate how many data item entrieswere retrieved in response to the previously-submitted filteringcriteria (1,234 in the depicted scenario), and the subset (items 20-29in the depicted scenario) that are currently being presented. Element1946 may be used to label groups of data items in a single interaction,e.g., as members of class A, class B, or unclassified in the depictedembodiment.

The labeling candidate data items themselves may each be represented bya respective panel 1914, such as 1914A and 1914B in the depictedembodiment. For a given data item 1914, several pieces of informationsimilar to those shown in FIG. 7 may be provided—e.g., an item image1916 may be shown, an image title 1919 may be presented, descriptiondetails 1917 may be provided, and additional summarized attributes 1918may be included in the display in various embodiments. In the depictedembodiment, interface elements for information about the justificationsor explanations for service-provided class predictions and/orlabeler-provided class names may be included. For example, if a classprediction for data item has been made by the service, the class namemay be indicted in the “Machine classification” portion of the itempanel 1914, and a justification (if available) may be provided in the“Machine justification” element. Similarly, in the depicted embodiment,if/when the labeler decides to designate the item 1914 as a member of aparticular class, the class may be indicated via the “Humanclassification” element of panel 1914 in the depicted embodiment.

Example Evaluation Tab View

FIG. 20 illustrates aspects of an example evaluation tab of aninteractive interface for training classifiers, according to at leastsome embodiments. In the evaluation tab, information about a set ofdiagnosis tests may be provided, and new diagnosis tests may bespecified in at least some embodiments. In the diagnosis test listsection 2016, a list of various diagnosis tests that have beenidentified for the current classifier development effort may be shown,along with a respective pass/fail status (indicated by a checkmark or anX respectively). An “Add diagnosis test” interface element may be usedto introduce new diagnosis tests to the suite of tests being used forevaluating the state of the classifier in the depicted embodiment.

As a result of an interaction such as clicking on the name of aparticular diagnosis test in list 2016, a panel comprising various typesof detailed information 2014 may be displayed for that test. The detailsmay include, for example, the current test result status (e.g., “Pass”or “Fail”), an explanation of the test, potential causes for anunsatisfactory status, zero or more potential remedies, and/or aninterface element that may be used to prioritize the test when selectingadditional labeling candidates in the depicted embodiment. Informationabout various metrics that are being collected (some of which may beused for the diagnosis test) may be provided in a current metrics panel2019 of the evaluation tab in the depicted embodiment. Details ofhistorical values and trends of selected metrics, similar to the kindsof information illustrated in FIG. 14 , may be provided via one or moremetrics history panels 2021 in the depicted embodiment.

Example Training Conclusion Tab View

One of the tabs of a multi-tab interactive interface may be used toterminate further training and approve of a trained classifier in someembodiments. FIG. 21 illustrates aspects of an example training effortpause and termination tab of an interactive interface for trainingclassifiers, according to at least some embodiments. The tab may belabeled the “Conclude” tab in the depicted embodiment. As shown, such atab may provide at least three types of interactive controls in someembodiments: a “Publish” control to approve the classifier (for whichvarious evaluation and/or diagnosis results may have been presented viathe “Evaluate” tab”), a “Pause sessions” to stop further labeling andtraining at least temporarily, and a “Discard version” control to removeinformation about the current version's training iterations.

Note that although a set of diagnosis tests may have been selected(e.g., based on user input) to help make decisions as to when trainingof a model should be considered complete in various embodiments, the“Publish” interface element may be used to override the diagnosis-basedtermination of training—e.g., a given classifier may be approved andpublished for production use in some embodiments even if some diagnosistests have not yet been passed. Analogously, in at least someembodiments, a classifier may not be approved or transitioned toproduction even if all the selected diagnosis tests have succeeded—e.g.,an authorized user may discard the current version if desired regardlessof how many diagnosis tests have succeeded. In at least one embodimentrespective sets of users of a classification service may be grantedpermissions to access and/or interact with controls of different sets oftabs of the kind shown in FIG. 15 -FIG. 21 . For example, some users maybe granted permissions to access and provide input via the Setup andRange classes tabs, others (e.g., label providers) via only the Teachtab, and yet others via the Evaluate and/or Conclude tabs. The multi-tabinterface may allow a simplified way of separating responsibilitiesassociated with various phases of the workflow of training a classifierusing interactive guided labeling in various embodiments. It is notedthat a given classification service or tool may employ any desiredcombinations of the elements and features illustrated in FIG. 7 -FIG. 21in various embodiments. At least some of the individual interfacefeatures shown in FIG. 7 -FIG. 21 may not necessarily be implemented insome embodiments.

Classification Service API Overview

The classification service may implement a set of applicationprogramming interfaces (APIs) in various embodiments, which may beinvoked from client-side devices (e.g., desktops, laptops, tabletdevices, phones or the like) when clients of the service utilizeinteractive interface elements similar to those discussed above. Thatis, in such embodiments, the interactions of users with graphical orweb-based interfaces (e.g., requests to start a new classifier trainingworkflow, submissions of one or more labels for candidate data items,etc.) may be translated into respective underlying API calls. In atleast some embodiments, one or more programmatic interactions with theservice may be performed directly using API calls—e.g., a trainingworkflow of a given classification effort may be initiated via an APIcall without using a graphical user interface.

FIG. 22 illustrates a high-level overview of invocations of applicationprogramming interfaces for interactions between clients and a machinelearning service utilizing interactive labeling feedback for classifiertraining, according to at least some embodiments. As shown, aclassification service 2280 may be implemented as a subcomponent of amore general machine learning service (MLS) 2212 in the depictedembodiment. The machine learning service 2212 may implement one or moreprogrammatic interfaces 2277 for its clients 2210, including for examplea set of APIs which may be invoked directly or indirectly by clients tosubmit requests for various machine learning tasks, receive responses tosuch requests, receive asynchronous notifications regarding the statusof various tasks, and so on. Other types of programmatic interfaces,such as web-based interactive sites, command-line tools, or graphicaluser interfaces may also be implemented in various embodiments by theMLS.

A client 2210 may submit an InitiateClassifierTraining request 2214 toset up a classifier training configuration in the depicted embodiment.The request may indicate various properties of a desired classifier viarespective parameters; example parameters that may be specified for sucha request in some embodiments are discussed in further details below inthe context of FIG. 23 . In response to the request, preparatory actionsmay be undertaken at the MLS, such as identifying one or more labelingfeedback providers that are available for the classifier, identifying aset of computing platforms to be used as training resources, and so on.When the MLS has completed its preparations, a ReadyToTrain response2215 may be transmitted to the client 2210 in the depicted embodiment.In some embodiments, it may take some time to complete the preparations,and the ReadyToTrain response 225 may be provided by an asynchronousmechanism such as an email, a text message or the like.

The client 2210 may then submit a BeginIterations request 2217 to theMLS in the depicted embodiment to start the training iterations. The MLSmay start one or more guided interactive labeling sessions 2252, similarto those discussed above, with a selected group of one or more labelproviders 2250 in various embodiments. A relatively small training set(whose labels may, for example, be assigned automatically by the MLSbased on keywords associated with target classes) may be used for thefirst training iteration in some embodiments, and the training set maythen be enlarged via the sessions 2252 based on presentation of labelingcandidate data items to the label providers 2250. The labeling candidatedata items may be presented as part of a visualization data set in anorder based on a respective rank, with respect to estimated learningcontribution, associated with including individual ones of candidates ina training set for a subsequent training iteration in at least someembodiments. As a result of the ordering, even if a label provider isable to provide just a few labels for the candidates presented at thestart of the ordered collection, the MLS may be able to include moreuseful labels in the training sets for subsequent iterations than ifrandomly-selected data items had been labeled in such embodiments. Thesubmissions of the labels by label providers 2250 may be asynchronouswith respect to the start/end of any given training iteration in variousembodiments—e.g., when a given training iteration is completed at theback-end servers of the MLS, the set of labels that have been submittedsince the most recent training set was constructed may be used to expandthe training set for the next training iteration. In at least oneembodiment, the label providers 2250 may not necessarily be made awareof the starting and ending of training iterations—instead, they mayiteratively receive new sets of labeling candidates, submit labelsand/or filtering criteria for data items as and when desired, until thetraining is eventually terminated.

Status indicators or updates 2219 with respect to various classifiertraining metrics may be provided, e.g., automatically or in response toadditional programmatic requests, to the clients by the MLS 2212 in thedepicted embodiment. In at least some embodiments, a set of metricswhose status is to be provided may be defined and/or selected by theclients 2210; in other embodiments, the MLS 2212 may select a defaultset of metrics for which status indicators are to be provided, e.g.,based on the type of classification (binary versus multi-class) beingperformed, and/or based on the problem domain being addressed. In somecases the status indicators may be derived from trends in the underlyingmetrics—e.g., a stability status may be determined with respect to agiven metric based on the variation among the most recent N valuescollected for a given metric. Based on objectives associated with thestatus indicators and/or the underlying metrics, one or more trainingenhancement actions may be undertaken in various embodiments—e.g., thekinds of labeling feedback candidate items to be presented to one ormore label providers may be determined such that the value of aparticular metric may be expected to move in a desired direction.

Eventually, e.g., after some set of specified training objectives havebeen satisfied and/or a budget for training resources or time has beenexhausted, a TrainingComplete message 2221 may be transmitted from theMLS to the client in the depicted embodiment. Alternatively, in someembodiments, training iterations may be terminated at the initiative ofthe client, e.g., in response to receiving a programmatic request toterminate the training. After the training is complete, a trainedversion of the classifier (e.g., the iteration-final classifier of themost recent training iteration) may be used to classify data items thatwere not used during training. For example, in the depicted embodiment,a ClassifyNewItem request 2223 may be submitted programmatically to theMLS, and a PredictedClass response 2225 may be sent in response based onthe prediction generated by the trained model. A number of other typesof programmatic interactions via interfaces 2277 may be supported invarious embodiments, and some of the programmatic interactions indicatesin FIG. 22 may not be supported in at least one embodiment.

Example Classifier Training Request

FIG. 23 illustrates example elements of a programmatic request toinitiate training of a classifier, according to at least someembodiments. As shown, an InitiateClassifierTraining request 2302 mayinclude, among other parameters, a class descriptors parameter 2305, adata sources parameter 2308, identifiers or names of one or more itemretrieval algorithms 2311, a set of item attribute descriptors 2314,identifiers or names of one or more classification algorithms 2317,identifiers or names of one or more vectorization algorithms 2317,identifiers or names of one or more active learning algorithms 2320, oneor more metrics descriptors 2323, one or more iteration completioncriteria 2326, one or more training completion criteria 2329, labelprovider information 2332, and/or a set of default interactive interfacesettings 2335 in the depicted embodiment.

The class descriptors parameter 2305 may be used to specify the targetclasses into which data items are to be categorized in the depictedembodiment, and/or one or more specific attribute values (such askeywords included in the titles/descriptions of the data items) that maybe used to create an initial training set. The data sources 2308 may,for example, identify databases, log files, web sites or the like fromwhich the data items may be retrieved by the classification service, aswell as credentials to be used to access the data items in someembodiments. Item retrieval algorithms 2311 may, for example, indicatehow data items relevant to the classification effort should be obtainedfrom the data sources—e.g., whether a search based on keywords should beused, whether all the data items of a given data source should beretrieved, whether decryption/decompression is to be performed (and ifso, the algorithms to be used for decryption/decompression). Attributedescriptors 2314 may indicate the names and descriptions of variousrelevant attributes of the data items, and how various relevantattributes of data items may be parsed/extracted from the raw data itemsif needed in at least some embodiments.

The specific algorithm(s) to be used, e.g., at least for the finalclassifier of various training iterations, such as a logistic regressionalgorithm, a neural network-based algorithm, or the like may beindicated via the classification algorithms parameter 2317 in thedepicted embodiment. One or more algorithms to be used for generatingfeature vectors from the raw attribute values may be indicated viavectorization algorithms parameter 2318 in some embodiments. Activelearning algorithms (such as query by committee, expected errorreduction, variance reduction, or the like) to be used to help ranklabel feedback candidate data items, such that more useful (from thepoint of view of accelerating learning) data items have a higherprobability of being labeled earlier in the process of expanding thetraining set, may be specified via the active learning algorithmsparameter 2320 in the depicted embodiment.

Information about one or more metrics to be collected from the trainingiterations may be provided via metrics descriptors parameter 2323 insome embodiments. A given training iteration may comprise one or moreepochs (passes through the training data set available for the iterationin some embodiments. Rules to determine when a given training iterationis to be considered complete (e.g., when a specified number of epochs iscompleted, when the difference in a metric value from a previous epochfalls below a threshold, when a specified amount of time has elapsed,when a specified amount of processing resources have been consumed,etc.) may be indicated via iteration completion criteria parameter 2326in some embodiments. Similarly, in at least some embodiments, trainingcompletion criteria parameter 2329 may include a set of rules (e.g., theset of diagnosis tests that have to be passed, an overall trainingbudget expressed in terms of resource usage or time, etc.) to be used bythe classification service to stop scheduling further trainingiterations.

In some embodiments, at least some level of guidance regarding labelproviders to be used during the classification effort may be provided tothe classification service as part of a classifier training initiationrequest. For example, in the depicted embodiment, the label providerinformation parameter 2332 may be used to indicate a set of labelproviders that may be available, or may have the requisite subjectmatter knowledge, to provide labels for the classifiers. The maximum orminimum number of classifiers to be used, and/or a budget associatedwith label generation (e.g., in a scenario where the label providers arebeing paid based at least in part on the number or rate of labels theyprovide) may be indicated in parameter 2332 in some embodiments. In atleast one embodiment, one or more settings for the interactiveprogrammatic interfaces to be used, such as the number of candidate dataitems to be presented at a time to a given label provider, may bespecified via parameter 2335.

In one embodiment, some or all of the parameters indicated in FIG. 23may be specified as part of a configuration file in some selected format(e.g., JavaScript Object Notation (JSON), Extended Markup Language (XML)or the like). In at least some embodiments, one or more of theparameters indicated in FIG. 23 may not necessarily be included in atraining initiation request by a client. In various embodiments, defaultvalues may be selected at the classification service for some parametersfor which specific values are not provided by the client. In someembodiments, values for individual ones of the parameters may bespecified in separate programmatic interactions—that is, not all theparameter values may be sent in the same request. Other parameters, notindicated in FIG. 23 , may be transmitted to the service in someembodiments.

Customized Labelling Sessions

In scenarios in which multiple label providers are used, individual onesof the label providers may have differing capabilities andresponsiveness characteristics—e.g., some label providers may be fasteror otherwise superior to others with respect to identifying data itemsof particular classes, and so on. FIG. 24 illustrates an examplescenario in which the set of candidate data items presented for labelingfeedback may be customized for respective label providers, according toat least some embodiments. In the depicted embodiment, a classifiertraining subsystem 2402 may comprise, among other components, a labelprovider skills/capabilities detector 2404 implemented using one or morecomputing devices. Such a detector may, for example, keep track of howquickly different label providers such as 2420A, 2420B or 2420C respondto label feedback requests, the extent to which the labels provided bythe different label providers 2420 tend to agree with the classpredictions generated at the training subsystem, and so on. Using suchmetrics, respective profiles of the different label providers may begenerated in at least some embodiments.

In turn, the profiles or characteristics of the label providers may beused in some embodiments to customize the respective subsets ofcandidate data items that are presented to the various label providers.For example, customized subset 2410A may be presented to label provider2410A, customized subset 2410B may be presented to label provider 2420B,and customized subset 2420C may be provided to label provider 2420C. Inone embodiment, the skills/capabilities detector may itself employmachine learning models to help select the set of labeling candidatedata items to be presented to the individual label providers. Thesubsets 2410 may differ from one another, for example, in size (e.g.,label providers that provide feedback faster may be presented with morecandidates), in data item attributes (e.g., a label provider that hasbeen identified as being better at discriminating among hard-to-labeldata items may be presented with candidate data items that are rankedhigher in degree of labeling difficulty, and so on), and so on.

In at least one embodiment, the programmatic interfaces of theclassification service may be used by authorized entities (e.g., thedata scientists analyzing the progress of the classifier, stakeholderson whose behalf the classifier is being trained, and so on) to view thelabels (and/or justifications for such labels) produced by individualones of the label providers. For example, a user may be able to viewlabel sets and justifications 2422A corresponding to one or moresubmission by label provider 2420A during a guided labeling session,label sets and justifications 2422B from label provider 2420B, labelsets and justifications 2422C from label provider 2420C, and so on inthe depicted embodiment. In at least some embodiments, to ensure theprivacy of individuals performing the roles of label providers, personalidentification information pertaining to any given label provider may beobfuscated and/or made inaccessible to other users of the interactiveinterfaces, but an anonymized identifier (e.g., LP0034 or “labelprovider 34” for one of a pool of 50 label providers) may be used todistinguish the label providers and their work products from oneanother. In at least one embodiment, based on an analysis of metricsstatus or trends, one of the training enhancement operations that may beimplemented may comprise increasing or decreasing the size of a pool oflabel providers being used for a given classifier training exercise. Insome embodiments, respective labels for a given data item may beobtained from different label providers, and one or more aggregator(s)2424 may reconcile the differences (if any) among the labels obtainedfor that data item. For example, if label providers 2420A and 2420Cprovided label L1, while label provider 2420B provided label L2, anaggregator 2424 may examine the corresponding justifications (and/or usea majority-vote-based technique) to determine the label that should beused for the data item.

Provider Network Environment

In at least some embodiments, the classification service may beimplemented as part of a suite of services of a provider network. FIG.25 illustrates an example provider network environment in which aclassification service may be implemented, according to at least someembodiments. Networks set up by an entity such as a company or a publicsector organization to provide one or more network-accessible services(such as various types of cloud-based computing, storage or analyticsservices) accessible via the Internet and/or other networks to adistributed set of clients may be termed provider networks in one ormore embodiments. A provider network may sometimes be referred to as a“public cloud” environment. The resources of a provider network, or evena given service of a provider network, may in some cases be distributedacross multiple data centers, which in turn may be distributed amongnumerous geographical regions (e.g., with each region corresponding toone or more cities, states or countries).

In the depicted embodiment, provider network 2501 may comprise resourcesused to implement a plurality of services, including for example avirtualized computing service (VCS) 2503, a database/storage service2523, and a machine learning service (MLS) 2571. The machine learningservice 2571 in turn may comprise a classification service 2543 (whichmay have at least some of the features and functionality of theclassification service discussed in the context of FIG. 1 and otherfigures) in at least some embodiments; in other embodiments, theclassification service may be implemented as a separate service ratherthan as a component of the MLS. Components of a given service mayutilize components of other services in the depicted embodiment—e.g.,for some machine learning tasks, a component of the machine learningservice 2571 may utilize virtual machines implemented at computingplatforms such as 2505A-2505D of the virtualized computing service, theraw data and/or metadata for various machine learning tasks may bestored at storage servers 2525 (e.g., 2525A-2525D) of storage service2523, and so on. Individual ones of the services shown in FIG. 25 mayimplement a respective set of programmatic interfaces 2577 which can beused by external and/or internal clients (where the internal clients maycomprise components of other services) in the depicted embodiment.

As shown, the classification service 2543 may comprise, among othercomponents, a training subsystem 2547 and a run-time subsystem 2548 inthe depicted embodiment. The training subsystem may comprise one or morecomputing devices that collectively coordinate the implementation of thetraining iterations and the asynchronous labeling sessions as discussedearlier in various embodiments. The run-time subsystem may comprise oneor more computing devices which may be used to manage the execution oftrained classifiers to provide class predictions after the trainingiterations are complete. The classification service 2543 may interactwith one or more other services of the provider network in at least twoways in the depicted embodiment. First, resources of other services,such as computing platforms 2505 or storage servers 2525 may be used toperform some of the computations involved in classifier training andexecution, and/or to store input data or results of classifiers—e.g.,one or more of the data sources from which data items are retrieved maycomprise resources of the database/storage service. The storage service2523 and/or the VCS 2503 may each provide high levels of availability,data durability, and failure resilience, enabling workloads associatedwith a large collection of classification customers to be handled invarious embodiments. Of course, in various embodiments, algorithmsobtained from algorithm library 2575 may be used for various aspects ofclassifier training, labeling feedback candidate selection and the like.In some embodiments, execution platforms 2576 that are optimizedspecifically for machine learning algorithms may be employed forclassifier training and/or execution. Job schedulers 2579 may coordinateresource allocation and scheduling for numerous classifier developmentefforts concurrently in some embodiments. In one embodiment,online/real-time analysis managers 2577 of the MLS may be used torespond to classification requests for streaming data records as soon asthe records are obtained.

In some embodiments, the techniques for supporting the training andexecution of classifiers may be implemented without acquiring resourcesof network-accessible services such as those shown in FIG. 25 . Forexample, a standalone tool implemented at one or more computing deviceswhich are not part of a network-accessible service may be used in someembodiments.

Methods for Implementing a Classification Service

FIG. 26 is a flow diagram illustrating aspects of operations that may beperformed to train classifiers with the help of interactive labelingfeedback sessions, according to at least some embodiments. High-levelparameters (as well as meta-parameters) of a classifier training andevaluation effort may be determined (element 2601). At least some of theparameters or meta-parameters may be obtained from programmatic requestssubmitted by clients of a classification service and/or a broadermachine learning service of the kinds discussed earlier in someembodiments. Other parameters may be selected by the service itself invarious embodiments. Based on guidance provided by the clients regardingthe classification tasks to be performed and/or a knowledge base of theclassification service, a set of data sources and associated data itemretrieval techniques may be determined, the type of machine learningmodels to be used for the classification and/or active learning may beidentified, class definitions of the target classes may be obtained andstored, an initial training set (e.g., with labels assigned based onkeywords alone, or obtained from a set of label providers), metrics tobe collected during the training iterations may be selected, sessioncontrol information (e.g., the number of guided labeling feedbacksessions to be used) and the like may be identified in the depictedembodiment. The specific set of resources to be used for training andevaluation may be identified as well at the service (element 2604),e.g., from a pool of such resources available at the service oraccessible from the service.

One or more training iterations may be initiated for the classificationproblem being addressed (element 2607), in which the resourcesidentified in operations corresponding to element 2604 may be utilized.At a high level, a given training iteration may comprise at least twocategories of operations in various embodiments: back-end operations atthe classification service, in which one or more classifiers may betrained using the available labeled data, and front-end or client-sideoperations, in which new labels (or, in some cases, corrected labels)may be requested and obtained from a set of label providers via a set ofguided labeling feedback sessions. The two types of operations may beperformed asynchronously of one another in various embodiments—e.g., newversions of models may be trained after an updated training set isidentified, without waiting for responses for all outstanding labelingcandidates, and new labeling candidates may be presented in the labelingsessions in batches, with the batches being presented independently ofexactly when the training iterations are completed or initiated at theback end. The overall goal of the workflow may comprise accumulating atraining set of labeled data items that are more likely to contribute toclassifier learning than other data items quickly, such that aclassifier which meets desired quality criteria is trained as soon aspossible, in various embodiments.

A set of class labels may be obtained for some number of data items(element 2610). The set of data items for which labeling feedback issolicited from a pool of label providers may be selected using any of avariety of techniques in various embodiments—e.g., for some earlyiterations, random selection/sampling may be used, while for otheriterations, active learning in combination with filtering based onuser-supplied filtering criteria of the kinds discussed above (includingfor example search terms) may be used. As mentioned earlier, anycombination of a variety of active learning algorithms, includingquery-by-committee, uncertainty sampling and the like may be used indifferent embodiments. At least in some embodiments, the current statusof various training metrics and/or diagnosis test results may be used tohelp select the candidate data items. Using the class labels that havebeen obtained/accumulated thus far, one or more classifiers (such as acommittee of classifiers) may be trained in various embodiments (element2613). In some embodiments, as mentioned earlier, two types ofclassifiers may be trained: one group of classifiers trained usingrespective subsets of the available labeled data, and an iteration-finalclassifier trained with all the available labeled data. In one suchembodiment, the results obtained from the iteration-final classifier maybe used to determine the overall training progress, while the resultsobtained from the first group may be used to help select additionalcandidates for labeling feedback (e.g., based on the variance in classpredictions measured for various data items of the test sets, or basedon the proximity of the predicted classes to class boundaries).Depending on the nature of the classification problem being addressed,binary or multi-class classification models may be used in variousembodiments. Any of a wide variety of classification algorithms may beused in different embodiments, including for example logisticregression, neural network based algorithms, tree-based algorithms suchas Random Forest and the like.

If training completion criteria are met (as detected in element 2616),training iterations may be terminated (element 2622) (i.e., noadditional iterations may be scheduled) in various embodiments. Atrained classifier (e.g., the iteration-final classifier of the lastcompleted training iteration) may be stored and/or used to generate andprovide class predictions for various data items that were not used fortraining. A variety of training completion criteria may be used indifferent embodiments—e.g., training may be terminated if a set ofquality criteria are met by the classifier(s) being developed, or if aresource or time budget is exhausted. In at least one embodiment a setof diagnosis tests may be identified to help decide when a classifierreaches an acceptable quality level. If training completion criteria arenot met (as also detected in operations corresponding to element 2616),a set of additional candidate data items may be identified for labelingfeedback in a subsequent training iteration (element 2619), e.g., withthe help of an active learning algorithm, impact-based sampling (inwhich items whose impact on one or more training metrics is estimated,if the items were labeled and included in a training data set) and/orother sampling algorithms, and the next training iteration may beinitiated. In the next training iteration, operations corresponding toelements 2607 may again be performed in the depicted embodiment.

FIG. 27 is a flow diagram illustrating aspects of operations that may beperformed during interactive labeling sessions of a classificationservice, according to at least some embodiments. High-level parameters(as well as meta-parameters) of a classifier training and evaluationeffort may be determined (element 2701). Such parameters may include,for example, the type of classifier to be used, the type of activelearning algorithm to be used, and so on. At least some of theparameters or meta-parameters may be obtained via requests submitted byclients, e.g., of a classification service and/or a broader machinelearning service of the kind described above via a programmaticinterface in some embodiments. Other parameters may be selected by theservice itself in various embodiments. Based on theguidelines/preferences of the clients regarding the classification tasksto be performed, a set of data sources and associated retrievaltechniques may be determined, the type of machine learning models to beused for the classification and/or active learning may be identified,class definitions of the target classes may be obtained and stored, aninitial training set (e.g., with labels assigned based on keywordsalone), session control information (e.g., the number of guided labelingfeedback sessions to be used) and the like may be identified in thedepicted embodiment. The specific set of resources to be used fortraining and evaluation may be identified as well at the service(element 2704), e.g., from a pool of such resources available at theservice or accessible from the service.

A guided labeling feedback session may be initiated in the depictedembodiment (element 2707), which may proceed concurrently withasynchronous classifier training iterations at the back-end of theservice. That is, the starting and ending of a training session may notnecessarily be synchronous with the start or end of a feedback session,or with any individual interaction (such as label submission, a filtercriteria submission, etc.) of any given label provider. The sessions maybe described as “guided” in various embodiments because the service maypresent labeling feedback candidate data items to label providers in aspecific order, with various kinds of annotations/highlighting, metricsinformation and filtering tools in the interactive interface, such thatthe interface as a whole helps the label providers to submit more usefullabels to the service earlier in the training process in variousembodiments. For example, during a given session in one embodiment, avisualization data set may be presented via the interactive programmaticinterface, in which information about several labeling candidate dataitems is included, with the data items arranged in an order based atleast in part on a respective rank assigned to the data items withrespect to estimated learning contribution and/or one or more othermetrics. In addition, in at least some embodiments, a presented view ofa data item may indicate (e.g., via highlighting, color, font, etc.) oneor more attributes of the data item whose correlation with membership ina particular target class exceeds a threshold.

During the feedback session, respective labels for one or more of thepresented data items may be obtained, together with a filter criterionto be used to select additional data items for presentation viaadditional visualization data sets in some embodiments (element 2713).In at least some embodiments, the ranking and/or selection of individualdata items for presentation via the interactive interface may thus bebased not just on metrics generated and analyzed at the classificationservice, but also on filter criteria indicated by the label provider,training status information collected at the service from recenttraining iterations, results of diagnosis tests of the kind describedearlier, and so on. Filtering criteria, which may for example includesearch terms, target or user-defined labels, ranges of classificationscores indicated via an element (such as a distribution ribbon of thekind described above) of the interactive interface, and so on, may forexample help a label provider to focus on data items that are ofinterest to the label providers, or have characteristics on which thelabel provider has some expertise and also help the model learn quickly.The set of newly-labeled (or re-labeled) data items may be added to thetraining set to be used for one or more training iterations (e.g., thenext training iteration) in various embodiments (element 2716). As moretraining iterations are performed, additional labels for data items thathave been identified as likely to contribute to faster or moregoal-directed learning may thus be used to gradually increase thetraining set size in various embodiments.

If session completion criteria have been met (as detected in element2719) after a particular submission of labels and/or filtering criteria,the session may be ended (element 2725) in the depicted embodiment. Anycombination of various criteria may be used to end a given session indifferent embodiments—e.g., if the classifier has satisfied itsdiagnosis tests or an authorized entity has decided that furthertraining is not needed, if a budget associated with the training or thesession has been used up, if the label provider has stopped providinglabels at a desired rate or of a desired quality level, and so on. Afterthe training process as a whole is terminated, a trained classifiermodel (whose training set includes labels provided in the session) maybe stored and/or used to generate and provide classification predictionsfor one or more previously-unseen or new data items in variousembodiments.

If the session completion criteria are not met (as also detected inelement 2719), a combination of filtering (if filtering criteria wereprovided), active learning (such as query by committee, uncertaintysampling or the like as discussed earlier), and/or other techniques(such as an indicated priority of a particular diagnosis test) may beused to identify next set of candidate data items for which labelingfeedback is requested in the depicted embodiment (element 2722).Operations corresponding to element 2710 onwards may then be performedagain using the new candidates (and/or any data items that remainunlabeled from among the set of data items prepared earlier forpresentation) in various embodiments, until eventually the session isterminated. Thus, at least in some embodiments, multiple visualizationdata sets may potentially be generated and presented to a given labelprovider during a given labeling session.

In various embodiments, a label provider need not necessarily providelabels for all the items selected for the label provider before a newvisualization data set is generated and presented. In some embodiments,during a given session, a label provider may request (e.g., via variousfiltering criteria) that data items for which labels have already beengenerated, or for which class predictions have already been generated,be presented to the label provider—that is, not all the data itemspresented to a label provider may necessarily be candidates for labelingfeedback; instead some items may help the label provider to learn aboutwhat has already been done during the training workflow underway, whichmay in turn help the label provider provide (or correct) at least somelabels. Some search terms or search query predicates may be recommendedby the service in various embodiments, e.g., based on the detection ofcorrelations between such terms/predicates and membership in variousclasses. In one embodiment, label providers may create user-definedlabels, e.g., for temporary grouping of some set of data items, and usesuch user-defined labels to filter the data items during a session. Bulklabeling of data items (e.g., using a “label all” interface element) maybe used to reduce the number of individual mouse clicks or otherinteractions required from label providers in at least some embodiments.In at least some embodiments, in addition to a label for a given dataitem, a label provider may provide a justification or reason why aparticular label was selected (or why a previously-assigned label waschanged). In one embodiment, when a data item for which a target classhas been predicted by a classifier is presented via the interactiveinterface, a justification for the prediction may also be provided,helping the label provider understand the reasons for the prediction(which in turn may help the label provider label other similar items,for example). In at least one embodiment, a graphical representation ofa statistical distribution of data items (such as a ribbon of the kinddiscussed earlier) for which predictions have been generated may bepresented to a label provider during a feedback session. The labelprovider may use an element of the programmatic interface to provide anindication of a selected sub-range of class scores of the distributionas a filtering criterion in such an embodiment, causing representationsof data items with predicted class scores in the sub-range to bepresented. Thus, the graphical display of predicted class distributionsmay provide another type of filtering capability in at least someembodiments. The graphical display of predicted class scores mayindicate a current set of class boundaries (i.e., the scores used todistinguish among classes, with a selected confidence level, as of arecent training iteration) in at least some embodiments.

In some embodiments in which multiple labeling feedback sessions are setup with respective label providers, the service may analyze the feedback(e.g., labels, search terms and other filtering criteria, etc.) obtainedfrom individual ones of the label providers, and tailor the sets of dataitems to individual label providers based on such analysis. In effect,respective capability and/or interest profiles may be set up forindividual ones of the label providers based on the work they have donethus far, and such profiles may be included in the set of factors (alongwith other factors such as estimated learning contributions, etc.) usedto identify data items for presentation to the label providers. Forexample, if a given label provider has been found to be especiallyproficient and quick at accurately labeling items of target class C1,but not as proficient at accurately labeling items of target class C2,items that are estimated to have a higher probability of being labeledas C1 may be preferentially selected for the label provider.

FIG. 28 is a flow diagram illustrating aspects of operations that may beperformed to present visual representations of training statusindicators during classifier training, according to at least someembodiments. In the depicted embodiment, a set of parameters (as well asmeta-parameters) of a classifier training and evaluation effort may bedetermined (element 2801). At least some of the parameters ormeta-parameters may be obtained via requests submitted via aprogrammatic interface, e.g., of a classification service of the kinddiscussed above and/or a broader machine learning service in someembodiments. Other parameters may be selected by the service itself invarious embodiments. Based on the information obtained regarding theclassification tasks to be performed, a set of data sources andassociated retrieval techniques may be determined, the type of machinelearning models to be used for the classification and/or activelearning, class definitions of the target classes may be obtained andstored, session control information (e.g., the number of guided labelingfeedback sessions to be used) and the like may be identified in thedepicted embodiment. The specific set of resources to be used fortraining and evaluation may be identified as well at the service(element 2804), e.g., from a pool of such resources available at theservice or accessible from the service.

Training iterations of the kind discussed above may be initiated(element 2807), along with the accompanying asynchronous guided labelingfeedback sessions in the depicted embodiment. As more trainingiterations are performed, additional labels for data items that havebeen identified as likely to contribute to faster or more goal-directedlearning may be obtained asynchronously from label providers in thefeedback sessions, gradually increasing the training set size in variousembodiments. As discussed earlier, the set of labeling candidatespresented to a given label provider may be ordered based at least inpart on a respective rank, with respect to estimated learningcontribution, associated with including individual ones of the labelingcandidates in a training set for a particular training iteration of oneor more classification models in some embodiments.

A number of metrics may be collected with regard to individual trainingiterations as well as for sequences of training iterations, and suchmetrics may be used to generate training status indicators that can bepresented visually to users/clients of the service, such as datascientists that may wish to analyze/debug the training progress invarious embodiments. Corresponding to individual ones of a plurality ofclassifier training iterations whose training sets include labelsobtained via the feedback sessions, respective sets of metrics, statusindicators and/or diagnosis tests may be identified in variousembodiments (element 2810). Such indicators may include, among others,(a) a representation of a fraction of a group of data items for whichclassification results that have been obtained in a particularclassifier training iteration do not meet a threshold criterion and/or(b) a representation of a stability trend of a particular trainingmetric over a plurality of classifier training iterations in someembodiments. The particular set of metrics for which status informationis to be collected and/or displayed may vary, e.g., based on inputreceived from the users of the service in various embodiments—that is,different users may specify respective sets of metrics and statusinformation to be derived from the metrics. In some embodiments, theservice may itself select at least a subset of the metrics whose statusis to be indicated, e.g., based on the types of classifier (binary ormulti-class) being developed. Any combination of a variety of metricsmay be collected and presented in different embodiments, including forexample (a) a positive predictive value (PPV), (b) a negative predictivevalue (NPV), (c) an accuracy, (d) a prevalence, (e) a precision, (f) afalse discovery rate, (g) a false omission rate, (h) a recall, (i) asensitivity, (j) a diagnostic odds ratio, and/or (k) an F1 score. In atleast one embodiment, the metrics collected and presented may include acount of the number of labels that have been obtained for one or moretarget classes.

Visualization data sets comprising the status indicators (and/or resultsof diagnosis tests) may be prepared at the service, e.g., automaticallyat selected time intervals, and/or in response to requests submitted viathe interactive interfaces being implemented by the service in variousembodiments (element 2813). The presentation of a given visualizationdata set may include various types of panels and layout components indifferent embodiments. In at least one implementation, a first displaycomponent may include (a) respective values of a plurality of selectedstatus indicators as of a first classifier training iteration and (b) aplurality of values of an individual status indicator as of respectivesuccessive classifier training iterations. As such, a viewer of thepresented data may be able to easily see the values of various metricsfor a given training iteration or point in time, and may also be able tosee how one or more of the metrics changed across multiple trainingiterations in such embodiments. The interactive interface used forpresenting the status information may, for example, include zoom-incapabilities, temporal correlation elements (e.g., elements which can beused to simultaneously inspect values for a number of differentmetrics/status indicators at a given point in time), and so on. Suchinformation may be used to debug/analyze the model being trained invarious embodiments, to initiate modifications of model meta-parametersetc.

In at least one embodiment, users may be able to view the specificlabels and/or associated justification information provided by anindividual label provider, and/or to view the differences betweentraining sets of a pair of training iterations. In one embodiment, theservice may provide a visual indication of explanatory factorsassociated with a change in a training metric between one trainingiteration and another—e.g., a set of terms that were present in dataitems that were added to the training set between iterations I1 and I2,and that are highly correlated with membership in a particular class,may be displayed.

In some embodiments, after a visualization of a particular data setcomprising status information is presented, one or more trainingenhancement actions may be initiated, e.g., based on objectivesassociated with the status indicators that were shown (element 2816). Anumber of different types of enhancement actions intended to acceleratethe training process may be initiated in different embodiments, such asexpanding/contracting the pool of label providers, selecting labelcandidates that are expected to help meet specific objectives associatedwith the status indicators (such as increasing the coverage for varioustarget classes), customizing the label candidates transmitted todifferent label providers, modifying the training iteration intervals,and so on. In effect, in various embodiments, objective associated withone or more specific metrics whose status indicators are presented tousers may help to guide the subsequent iterations of training in thedepicted embodiment. In some embodiments, the service may presentalternative training enhancement actions that may be undertaken, and aclient of the service may select the particular actions to beimplemented. In other embodiments, a client may indicate a particularstatus indicator or metric as having a higher priority than others, andsuch guidance may be used to identify the specific actions to beinitiated.

As the iterations proceed, more status indicators may be gathered andthe presentation may be updated. In some embodiments, a set ofmetrics-based diagnosis tests may be selected (e.g., by the serviceitself or based on programmatic input from users), such that the resultsof the tests may be used to determine whether further trainingiterations should be scheduled or not. In various embodiments in whichsuch diagnosis tests are identified, the current results (and/or trendsin the results) of the diagnosis tests may be displayed visually, and/oran indication of the number or fraction if diagnosis tests that havebeen passed may be provided visually. The results of the diagnosis testsmay be used, for example, to automatically determine whether additionaltraining iterations are to be initiated. The different diagnosis testsmay be prioritized relative to one another in some embodiments, e.g.,based on programmatic input from users, and such prioritization mayfurther help identify training enhancement actions in at least oneembodiment. In some embodiments, if results of a particular diagnosistest are not yet satisfactory, or show a trend that is unsatisfactory, aremedial action with respect to the test may be initiated (similar tothe training enhancement actions mentioned above), e.g., eitherautomatically by the service or in response to programmatic requests. Inat least some embodiments, a user may submit a request to view dataitems for which the label indicated by a label provider differs from theclass label predicted by the latest version of the classier, e.g., inorder to help debug the model.

Eventually, the training iterations may be terminated, and a trainedversion of the classifier model or models (whose training set includedlabels obtained as a result of implementation of one or more of thetraining enhancement actions) may be stored and/or used to providepredicted classifications of new data items to one or more destinationsin the depicted embodiment (element 2822). In at least one embodiment inwhich diagnosis tests are used to determine when to terminate training,an authorized user may override the diagnosis-test-based decision—e.g.,a classifier may be approved for production use even if one or morediagnosis tests have not yet been passed, or, alternatively, additionalmodel training iterations may be scheduled even if all the selecteddiagnosis tests have been passed.

It is noted that in various embodiments, some of the operations shown inFIG. 26 , FIG. 27 or FIG. 28 may be implemented in a different orderthan that shown in the figure, or may be performed in parallel ratherthan sequentially. Additionally, some of the operations shown in FIG. 26, FIG. 27 or FIG. 28 may not be required in one or more implementations.

Use Cases

The techniques described above, of implementing a classification serviceor tool which can be used to quickly develop classifiers of desiredquality levels using a flexible interactive interface may be extremelyuseful in a variety of scenarios. More and more business problems arebeing solved with the help of machine learning techniques, among whichclassification (both binary and multi-class classification) is a veryfrequently used technique. In order to develop a classifier, labeledtraining data is required, and the labeling effort (which usuallyinvolves human labelers) may often represent a significant fraction ofthe resources, cost and time associated with the classifier developmentworkflow as a whole. In the techniques described, training may beinitiated using a very small training set. Results of the trainingiterations may be used, e.g., using active learning methodologies, toquickly identify more “useful” unlabeled data, for which labels may beobtained via asynchronous labeling sessions with some set of labelproviders. As more and more labels for useful data items areaccumulated, the quality of the classifiers produced in the trainingiterations may increase rapidly. The interactive user interfacesupported may enable at least three groups of entities involved in theclassification effort to perform their tasks more quickly andeffectively: (a) stakeholders responsible for starting, managing andterminating classification efforts, (b) label providers, who may havelittle machine learning expertise and (c) data scientists or analystswho wish to monitor the progress of classifier development, debugproblems, and so on. Using the combination of the streamlinedcustomizable back-end classifier development workflow and theinteractive interface, orders of magnitude reduction in the overallresources and time consumed for obtaining high-quality classificationmodels may be achieved in some cases.

Illustrative Computer System

In at least some embodiments, a server that implements a portion or allof one or more of the technologies described herein, including thetechniques for various front-end and/or back-end components of aclassification service or tool may include a general-purpose computersystem that includes or is configured to access one or morecomputer-accessible media. FIG. 29 illustrates such a general-purposecomputing device 9000. In the illustrated embodiment, computing device9000 includes one or more processors 9010 coupled to a system memory9020 (which may comprise both non-volatile and volatile memory modules)via an input/output (I/O) interface 9030. Computing device 9000 furtherincludes a network interface 9040 coupled to I/O interface 9030.

In various embodiments, computing device 9000 may be a uniprocessorsystem including one processor 9010, or a multiprocessor systemincluding several processors 9010 (e.g., two, four, eight, or anothersuitable number). Processors 9010 may be any suitable processors capableof executing instructions. For example, in various embodiments,processors 9010 may be general-purpose or embedded processorsimplementing any of a variety of instruction set architectures (ISAs),such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitableISA. In multiprocessor systems, each of processors 9010 may commonly,but not necessarily, implement the same ISA. In some implementations,graphics processing units (GPUs) may be used instead of, or in additionto, conventional processors.

System memory 9020 may be configured to store instructions and dataaccessible by processor(s) 9010. In at least some embodiments, thesystem memory 9020 may comprise both volatile and non-volatile portions;in other embodiments, only volatile memory may be used. In variousembodiments, the volatile portion of system memory 9020 may beimplemented using any suitable memory technology, such as static randomaccess memory (SRAM), synchronous dynamic RAM or any other type ofmemory. For the non-volatile portion of system memory (which maycomprise one or more NVDIMMs, for example), in some embodimentsflash-based memory devices, including NAND-flash devices, may be used.In at least some embodiments, the non-volatile portion of the systemmemory may include a power source, such as a supercapacitor or otherpower storage device (e.g., a battery). In various embodiments,memristor based resistive random access memory (ReRAM),three-dimensional NAND technologies, Ferroelectric RAM, magnetoresistiveRAM (MRAM), or any of various types of phase change memory (PCM) may beused at least for the non-volatile portion of system memory. In theillustrated embodiment, program instructions and data implementing oneor more desired functions, such as those methods, techniques, and datadescribed above, are shown stored within system memory 9020 as code 9025and data 9026.

In one embodiment, I/O interface 9030 may be configured to coordinateI/O traffic between processor 9010, system memory 9020, and anyperipheral devices in the device, including network interface 9040 orother peripheral interfaces such as various types of persistent and/orvolatile storage devices. In some embodiments, I/O interface 9030 mayperform any necessary protocol, timing or other data transformations toconvert data signals from one component (e.g., system memory 9020) intoa format suitable for use by another component (e.g., processor 9010).In some embodiments, I/O interface 9030 may include support for devicesattached through various types of peripheral buses, such as a variant ofthe Peripheral Component Interconnect (PCI) bus standard or theUniversal Serial Bus (USB) standard, for example. In some embodiments,the function of I/O interface 9030 may be split into two or moreseparate components, such as a north bridge and a south bridge, forexample. Also, in some embodiments some or all of the functionality ofI/O interface 9030, such as an interface to system memory 9020, may beincorporated directly into processor 9010.

Network interface 9040 may be configured to allow data to be exchangedbetween computing device 9000 and other devices 9060 attached to anetwork or networks 9050, such as other computer systems or devices asillustrated in FIG. 1 through FIG. 28 , for example. In variousembodiments, network interface 9040 may support communication via anysuitable wired or wireless general data networks, such as types ofEthernet network, for example. Additionally, network interface 9040 maysupport communication via telecommunications/telephony networks such asanalog voice networks or digital fiber communications networks, viastorage area networks such as Fibre Channel SANs, or via any othersuitable type of network and/or protocol.

In some embodiments, system memory 9020 may be one embodiment of acomputer-accessible medium configured to store program instructions anddata as described above for FIG. 1 through FIG. 28 for implementingembodiments of the corresponding methods and apparatus. However, inother embodiments, program instructions and/or data may be received,sent or stored upon different types of computer-accessible media.Generally speaking, a computer-accessible medium may includenon-transitory storage media or memory media such as magnetic or opticalmedia, e.g., disk or DVD/CD coupled to computing device 9000 via I/Ointerface 9030. A non-transitory computer-accessible storage medium mayalso include any volatile or non-volatile media such as RAM (e.g. SDRAM,DDR SDRAM, RDRAM, SRAM, etc.), ROM, etc., that may be included in someembodiments of computing device 9000 as system memory 9020 or anothertype of memory. Further, a computer-accessible medium may includetransmission media or signals such as electrical, electromagnetic, ordigital signals, conveyed via a communication medium such as a networkand/or a wireless link, such as may be implemented via network interface9040. Portions or all of multiple computing devices such as thatillustrated in FIG. 29 may be used to implement the describedfunctionality in various embodiments; for example, software componentsrunning on a variety of different devices and servers may collaborate toprovide the functionality. In some embodiments, portions of thedescribed functionality may be implemented using storage devices,network devices, or special-purpose computer systems, in addition to orinstead of being implemented using general-purpose computer systems. Theterm “computing device”, as used herein, refers to at least all thesetypes of devices, and is not limited to these types of devices.

Embodiments of the disclosure can be described in view of the followingclauses:

-   -   1. A system, comprising:    -   one or more computing devices of an artificial        intelligence-based classification service;    -   wherein the one or more computing devices are configured to:        -   perform one or more training iterations until a training            completion criterion is met, wherein a particular training            iteration comprises at least:            -   obtaining, via an interactive programmatic interface,                respective class labels for at least some data items of                a particular set of data items identified as candidates                for labeling feedback in a previous training iteration,                wherein at least some class labels of the respective                class labels are obtained asynchronously with respect                to (a) a start of the particular training iteration                and (b) an end of the previous training iteration;            -   generating, using one or more classifiers,                classification predictions corresponding to a test set,                wherein an individual classifier of the one or more                classifiers is trained using a training set that                includes at least some labels obtained using the                interactive programmatic interface; and            -   identifying, based at least in part on (a) the                classification predictions and (b) an active learning                algorithm, another set of data items as candidates for                labeling feedback with respect to the next training                iteration; and        -   provide, after the training completion criterion has been            met, a respective classification prediction obtained from a            particular classifier with respect to one or more data            items, wherein the particular classifier was trained using a            particular training set, wherein labels for at least some            items of the particular training set were obtained in the            one or more training iterations.    -   2. The system as recited in clause 1, wherein the one or more        computing devices are configured to:    -   generate a first machine learning model to identify at least a        first attribute value of one or more data items, such that a        correlation between the presence of the attribute value and a        variation in classification prediction of the one or more data        items exceeds a threshold; and    -   identify, using the first attribute, at least one data item as a        candidate for labeling feedback in the particular training        iteration.    -   3. The system as recited in any of clauses 1-2, wherein the one        or more computing devices are configured to:    -   include, as a candidate for labeling feedback in the particular        training iteration, a data item for which a label has been        obtained from a label provider.    -   4. The system as recited in any of clauses 1-3, wherein the        active learning algorithm comprises one or more of: (a) a        query-by-committee algorithm, (b) an uncertainty sampling        algorithm, (c) an expected model change algorithm, (d) an        expected error reduction algorithm, (e) a variance-reduction        algorithm, and/or (f) a density-weighted algorithm.    -   5. The system as recited in any of clauses 1-4, wherein        identifying the other set of data items as candidates is based        at least in part on a filter criterion indicated by a label        provider.    -   6. A method, comprising:    -   performing, by one or more computing devices:        -   one or more classifier training iterations until a training            completion criterion is met, wherein a particular classifier            training iteration comprises at least:            -   obtaining, via an interactive interface, asynchronously                with respect to a start of the particular classifier                training iteration, respective class labels for at least                some data items of a particular set of data items                identified as candidates for labeling feedback in an                earlier classifier training iteration;            -   identifying, based at least in part on an analysis of                classification predictions generated using one or more                classifiers whose training set includes at least one                label obtained using the interactive interface, another                set of data items as candidates for labeling feedback                with respect to the next training iteration; and            -   storing a classifier trained using a particular training                set, wherein labels for at least some items of the                particular training set were obtained in the one or more                training iterations.    -   7. The method as recited in clause 6, wherein the one or more        classifiers comprise a first classifier and a second classifier,        wherein the training set of the first classifier differs from        the training set of the second classifier, and wherein the        analysis of classification predictions comprises generating a        measure of variation between respective classification        predictions generated with respect to a particular data item by        individual ones of the one or more classifiers.    -   8. The method as recited in any of clauses 6-7, further        comprising performing, by the one or more computing devices:    -   obtaining an indication, via the interactive interface, of a        level of automation to be implemented at one or more stages of a        classification workflow which includes the one or more        classifier training iterations; and    -   determining, based at least in part on the level of automation,        a classification algorithm to be used for the particular        classifier.    -   9. The method as recited in any of clauses 6-8, wherein        obtaining the respective class labels comprises:    -   obtaining, subsequent to detecting that a first submit request        has been received via the interactive interface, a first group        of class labels, wherein individual labels of the first group of        class labels are assigned to respective data items of a first        group of data items; and    -   obtaining, subsequent to detecting that a second submit request        has been received via the interactive interface, a second group        of class labels, wherein individual labels of the second group        of class labels are assigned to respective data items of a        second group of data items.    -   10. The method as recited in clause 9, further comprising        performing, by the one or more computing devices:    -   in response to detecting that the first submit request has been        received, causing a representation of the second group of data        items to be presented via the interactive interface.    -   11. The method as recited in any of clauses 6-9, further        comprising performing, by the one or more computing devices:    -   selecting, based at least in part on a filtering criterion        indicated via the interactive interface, at least one data item        as a candidate for labeling feedback.    -   12. The method as recited in any of clauses 6-9 or 11, further        comprising performing, by the one or more computing devices:    -   determining, based at least in part on input received via a        programmatic interface, an indication of (a) a data source        and (b) a classification objective of the one or more classifier        training iterations;    -   retrieving, from the data source, the particular set of data        items; and    -   providing, via the interactive interface, an indication to one        or more label submitters of the classification objective.    -   13. The method as recited in any of clauses 6-9 or 11-12,        further comprising performing, by the one or more computing        devices:    -   determining, based at least in part on input received via a        programmatic interface, one or more feature processing        operations to be performed on the particular set of data items;        and    -   including, in input provided to the one or more classifiers,        results of the feature processing operations.    -   14. The method as recited in any of clauses 6-9 or 11-13,        further comprising performing, by the one or more computing        devices:    -   determining, based at least in part on input received via a        programmatic interface, a model type of at least one classifier        of the one or more classifiers, wherein the model type comprises        one or more of: (a) a logistic regression model or (b) a neural        network-based model.    -   15. The method as recited in any of clauses 6-9 or 11-14,        further comprising performing, by the one or more computing        devices:    -   determining the number of label providers to include in a set of        label providers to be utilized for at least the particular        training iteration; and    -   determining, based at least in part on analysis of earlier        interactions with individual members of the set of label        providers, a respective group of data items to be presented as        candidates for labeling feedback to individual members of the        set of label providers in the particular training iteration.    -   16. A non-transitory computer-accessible storage medium storing        program instructions that when executed on one or more        processors cause the one or more processors to:    -   perform one or more classifier training iterations until a        training completion criterion is met, wherein a particular        classifier training iteration comprises at least:    -   obtaining, via an interactive interface, asynchronously with        respect to a start of the particular classifier training        iteration, respective class labels for at least some data items        of a particular set of data items identified as candidates for        labeling feedback in an earlier classifier training iteration;    -   identifying, based at least in part on an analysis of        classification predictions generated using one or more        classifiers whose training set includes at least one label        obtained via the interactive interface, another set of data        items as candidates for labeling feedback with respect to the        next training iteration; and    -   store a classifier trained using a particular training set,        wherein labels for at least some items of the particular        training set were obtained in the one or more training        iterations.    -   17. The non-transitory computer-accessible storage medium as        recited in clause 16, wherein the instructions when executed on        one or more processors cause the one or more processors to:    -   generate a first machine learning model to identify at least a        first attribute value of one or more data items, such that a        correlation between the presence of the attribute value and a        variation in classification prediction of the one or more data        items exceeds a threshold; and    -   identify, using the first attribute, at least one data item as a        candidate for labeling feedback in the particular classifier        training iteration.    -   18. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-17, wherein the one or more        classers whose predictions are used to identify the other set of        candidates data items comprise a first classifier and a second        classifier, wherein the instructions when executed on the one or        more processors:    -   generate, for a third classifier, a training set that        includes (a) at least some data items of a training set of the        first classifier and (b) at least some data items of a training        set of the second classifier;    -   obtain, as part of the particular classifier training iteration,        classification results from the third classifier for one or more        data items;    -   utilize the classification results obtained from the third        classifier to identify one or more attribute values whose        correlation with predicted membership in a class exceeds a        threshold; and    -   provide an indication of the one or more attribute values via        the interactive programmatic interface.    -   19. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-18, wherein the instructions when        executed on one or more processors cause the one or more        processors to perform the one or more classifier training        iterations based at least in part on determining that a        programmatic request has been received at a network-accessible        service of a provider network.    -   20. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-19, wherein identifying the other        set of data items as candidates is based at least in part on a        filter criterion indicated by a label provider.

Embodiments of the disclosure can also be described in view of thefollowing clauses:

-   -   1. A system, comprising:    -   one or more computing devices of an artificial        intelligence-based classification service; wherein the one or        more computing devices are configured to:        -   initiate a labeling feedback session with a label provider;        -   cause, during the labeling feedback session, one or more            visualization data sets to be presented to the label            provider via an interactive programmatic interface,            including a particular visualization data set which            comprises an ordered representation of one or more data            items for which labeling feedback is requested, wherein the            order in which the one or more data items are arranged is            based at least in part on a respective rank, with respect to            estimated learning contribution, associated with including            individual ones of the data items in a training set for a            particular training iteration of one or more classification            models, and wherein a presented view of a first data item of            the one or more data items indicates a particular attribute            of the first data item whose correlation with a particular            predicted class exceeds a threshold;        -   obtain, during the feedback session, indications from the            label provider via the interactive programmatic interface            of (a) respective labels for one or more data items of the            one or more visualization data sets and (b) a filter            criterion to be used to select one or more other data items            to be presented via the interactive programmatic interface;            and        -   provide, to one or more destinations, a classification            prediction corresponding to a data item, wherein the            classification prediction is obtained from a classification            model trained using a training set which includes at least            one label of the respective labels.    -   2. The system as recited in clause 1, wherein at least one label        of the respective labels is obtained asynchronously with respect        to one or more training iterations of the one or more        classification models.    -   3. The system as recited in any of clauses 1-2, wherein the        filter criterion indicates a particular class label.    -   4. The system as recited in clause 3, wherein a classification        model of the one or more classification models is configured to        predict, for a particular data item, a class label selected from        a first set of target class labels, and wherein the particular        class label (a) is not a member of the first set and (b) is        assigned to a second data item via the interactive programmatic        interface as a user-defined temporary class label.    -   5. The system as recited in any of clauses 1-3, wherein the one        or more computing devices are configured to:    -   provide an indication, via the interactive programmatic        interface, of a plurality of recommended search query predicates        identified based at least in part on analysis of one or more        classifier training metrics.    -   6. A method, comprising:    -   performing, by one or more computing devices:        -   causing, during a first labeling feedback session of one or            more labeling feedback sessions, a first visualization data            set to be presented via an interactive programmatic            interface, wherein the first visualization data set            comprises a representation of one or more data items for            which labeling feedback is requested for generating a            training set of one or more classifiers, wherein at least a            first data item of the one or more data items is identified            based at least in part on an estimated rank, with respect to            one or more metrics, associated with including the first            data item in a training set;        -   obtaining, via the interactive programmatic interface during            the first labeling feedback session, (a) respective labels            for the one or more data items, including the first data            item, represented in the first visualization data sets            and (b) a filter criterion to be used to select one or more            other data items to be presented via the interactive            programmatic interface; and        -   storing a classifier trained using a training set which            includes at least one label of the respective labels.    -   7. The method as recited in clause 6, wherein in the        presentation of the first visualization data set, a        representation of the first data item indicates a particular        attribute of the first data item whose correlation with a        particular predicted class exceeds a threshold.    -   8. The method as recited in any of clauses 6-7, wherein the        first data item has a plurality of attributes including a first        attribute and a second attribute, and wherein in the        representation of the first data item, a first color is used to        indicate that the first attribute is correlated with a first        target class of the one or more classifiers, and a second color        is used to indicate that the second attribute is correlated with        a second target class of the one or more classifiers.    -   9. The method as recited in any of clauses 6-8, further        comprising performing, by the one or more computing devices,        wherein the indication of at least one label is obtained        asynchronously with respect to one or more classifier training        iterations.    -   10. The method as recited in any of clauses 6-9, wherein the        filtering criterion indicates a particular class label        associated with the one or more other data items.    -   11. The method as recited in any of clauses 6-10, further        comprising performing, by the one or more computing devices:    -   obtaining an indication, via the interactive interface, of a        level of automation to be implemented at one or more stages of a        classification workflow which includes the first labeling        feedback session; and    -   determining, based at least in part on the level of automation,        respective thresholds for one or more metrics to be used to        terminate the classification workflow.    -   12. The method as recited in any of clauses 6-11, wherein the        filtering criterion indicates a first search query predicate.    -   13. The method as recited in clause 12, further comprising        performing, by the one or more computing devices:    -   providing an indication, via the interactive programmatic        interface, of a plurality of recommended search query predicates        identified based at least in part on analysis of one or more        classifier training metrics, wherein the plurality of        recommended search query predicates includes the first search        query predicate.    -   14. The method as recited in any of clauses 6-12, wherein the        one or more data items for which labeling feedback is requested        comprise a second data item for which a label was obtained        earlier via the interactive programmatic interface, the method        further comprising performing, by the or more computing devices:    -   selecting the second data item for presentation for label        reconsideration, based at least in part on a determination of a        difference between the label obtained for the second data item        via the interactive programmatic interface, and a predicted        class score of the second data item.    -   15. The method as recited in any of clauses 6-12 or 14, further        comprising performing, by the one or more computing devices:    -   obtaining, during the first labeling feedback session, an        indication via the interactive programmatic interface of a        justification for assignment of a first label to the first data        item; and    -   causing the justification to be displayed via the interactive        programmatic interface to one or more entities.    -   16. A non-transitory computer-accessible storage medium storing        program instructions that when executed on one or more        processors cause the one or more processors to:    -   cause, during a first labeling feedback session of one or more        labeling feedback sessions, a first visualization data set to be        presented via an interactive programmatic interface, wherein the        first visualization data set comprises a representation of one        or more data items for which labeling feedback is requested for        generating a training set of one or more classifiers, wherein at        least a first data item of the one or more data items is        identified based at least in part on an estimated rank, with        respect to one or more metrics, associated with including the        first data item in a training set;    -   obtaining, via the interactive programmatic interface during the        first labeling feedback session, (a) respective labels for the        one or more data items of the first visualization data set        and (b) a filter criterion to be used to select one or more        other data items to be presented via the interactive        programmatic interface; and    -   storing a classifier trained using a training set which includes        at least one label of the respective labels.    -   17. The non-transitory computer-accessible storage medium as        recited in clause 16, wherein the instructions when executed on        the one or more processors cause the one or more processors to:    -   causing a graphical representation of a statistical distribution        of classified data items to be presented via the interactive        programmatic interface.    -   18. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-17, wherein the instructions when        executed on the one or more processors cause the one or more        processors to:    -   obtaining an indication, via the interactive programmatic        interface, of a selected sub-range of class scores of the        statistical distribution; and    -   cause representations of one or more data items with predicted        class scores in the selected sub-range to be displayed via the        interactive programmatic interface.    -   19. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-18, wherein the instructions when        executed on the one or more processors cause the one or more        processors to:    -   cause an indication of a predicted classification score        corresponding to a class boundary identified with a particular        confidence level to be included in the graphical representation        of the statistical distribution.    -   20. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-19, wherein the first labeling        session is initiated with a first label provider, wherein the        instructions when executed on the one or more processors cause        the one or more processors to:    -   identify, for a second labeling session initiated with a second        label provider, a particular collection of data items to be        labeled, wherein members of the particular collection are        selected based at least in part on an analysis of labels        provided earlier by the second label provider.    -   21. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-20, wherein the estimated rank is        determined at least in part on an active learning algorithm,        wherein the active learning algorithm comprises one or more        of: (a) a query-by-committee algorithm, (b) an uncertainty        sampling algorithm, (c) an expected model change algorithm, (d)        an expected error reduction algorithm, (e) a variance-reduction        algorithm, or (f) a density-weighted algorithm.    -   22. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-21, wherein the first visualization        data set comprises an indication that at least a portion of an        attribute of a particular data item is correlated with a        particular target class of a classifier of the one or more        classifiers, wherein the indication comprises a highlighting of        one or more of: (a) a text token, (b) at least a portion of an        image, (c) at least a portion of a video, or (d) at least a        portion of an audio recording.

Embodiments of the disclosure can also be described in view of thefollowing clauses:

-   -   1. A system, comprising:    -   one or more computing devices of an artificial        intelligence-based classification service;    -   wherein the one or more computing devices are configured to:        -   determine, corresponding to individual ones of a plurality            of classifier training iterations, respective sets of status            indicators, wherein a first set of status indicators            includes at least (a) a representation of a fraction of a            first group of data items for which classification results            that have been obtained in a particular classifier training            iteration do not meet a threshold criterion and (b) a            representation of a stability trend of a particular training            metric over a plurality of classifier training iterations,            wherein a training data set of the particular classifier            training iteration comprises at least some labels obtained            in response to a presentation of one or more data items of            the first group as candidates for labeling feedback;        -   cause, in response to a programmatic request, a first            visualization data set comprising at least one set of status            indicators to be presented via an interactive programmatic            interface, wherein presentation of the first visualization            data set includes an indication, within a first display,            of (a) respective values of a plurality of selected status            indicators as of a first classifier training iteration            and (b) a plurality of values of an individual status            indicator as of respective successive classifier training            iterations;        -   initiate, subsequent to presentation of the first            visualization data set, one or more training enhancement            actions, wherein a particular training enhancement action            comprises selecting, based at least in part on an objective            associated with a particular status indicator, one or more            data items for which respective labeling feedback is to be            obtained programmatically during one or more of the            classifier training iterations; and        -   provide, to one or more destinations, a classification            prediction corresponding to a particular data item, wherein            the classification prediction is obtained from a            classification model trained using a training set which            includes at least one label obtained as a result of            implementation of the particular training enhancement            action.    -   2. The system as recited in clause 1, wherein the one or more        computing devices are configured to:    -   identify, based at least in part on input received        programmatically, an indication of a set of training metrics of        which respective status indicators are to be included in the        visualization data set.    -   3. The system as recited in any of clauses 1-2, wherein the one        or more computing devices are configured to:    -   identify, based at least in part on a type of classifier that is        to be trained, at least one training metric of which a status        indicator is to be provided in the visualization data set.    -   4. The system as recited in any of clauses 1-3, wherein the one        or more computing devices are configured to:    -   present, via the interactive programmatic interface, an        indication of one or more of (a) one or more labels provided by        a selected label provider whose labels were used as part of the        training set for the particular training iteration or (b) a        justification provided by the selected label provider for a        label of the one or more labels.    -   5. The system as recited in any of clauses 1-4, wherein the one        or more training enhancement actions comprise modifying a size        of a pool of label providers from which labeling feedback is to        be obtained.    -   6. A method, comprising:    -   performing, by one or more computing devices:    -   determining, corresponding to individual ones of a plurality of        classifier training iterations, respective sets of status        indicators, wherein a first set of status indicators includes at        least a representation of a trend of a particular training        metric over a plurality of classifier training iterations;        -   causing, in response to a request obtained via an            interactive programmatic interface, a first visualization            data set corresponding to at least one set of status            indicators to be presented via an interactive programmatic            interface;        -   initiating, subsequent to presentation of the first            visualization data set, one or more training enhancement            actions, wherein a particular training enhancement action            comprises selecting, based at least in part an objective            associated with a particular status indicator, one or more            data items for which respective labeling feedback is to be            obtained programmatically in a subsequent classifier            training iteration; and        -   storing a classification model trained using a training set            which includes at least one label obtained as a result of            implementation of the particular training enhancement            action.    -   7. The method as recited in clause 6, further comprising        performing, by one or more computing devices:    -   obtaining, based at least in part on input received via the        interactive programmatic interface, an indication of a set of        training metrics of which respective status indicators are to be        provided via the interactive programmatic interface, wherein the        visualization data set comprises at least one status indicator        of the respective status indicators.    -   8. The method as recited in clause 7, wherein the set of        training metrics comprises one or more of: (a) a positive        predictive value (PPV), (b) a negative predictive value        (NPV), (c) an accuracy, (d) a prevalence, (e) a precision, (f) a        false discovery rate, (g) a false omission rate, (h) a        recall, (i) a sensitivity, (j) a diagnostic odds ratio, or (k)        an F1 score.    -   9. The method as recited in any of clauses 6-7, further        comprising performing, by the one or more computing devices:    -   providing an indication, in response to input received via the        interactive programmatic interface, of a difference in training        data sets between a first classifier training iteration and a        second classifier training iteration of the plurality of        classifier training iterations.    -   10. The method as recited in any of clauses 6-7 or 9, further        comprising performing, by the one or more computing devices:    -   providing, based at least in part on an analysis of a difference        in training data sets between a first classifier training        iteration and a second classifier training iteration of the        plurality of classifier training iterations, an indication of        one or more candidate explanatory factors associated with a        difference in training metrics between the first and second        classifier training iterations.    -   11. The method as recited in any of clauses 6-7 or 9-10, further        comprising performing, by the one or more computing devices:    -   causing an indication of results of a set of diagnosis tests,        with respect to one or more classifier training iterations of        the plurality of classifier training iterations, to be presented        via the interactive programmatic interface, wherein a first        diagnosis test of the set of diagnosis tests comprises        determining whether a particular first status indicator meets a        threshold condition.    -   12. The method as recited in clause 11, further comprising        performing, by the one or more computing devices:    -   causing, with respect to a particular diagnosis test of the set        of diagnosis tests whose result does not meet a threshold        criterion, an explanation of a recommended remedial action to be        presented via the interactive programmatic interface;    -   determining that the recommended remedial action has been        approved; and    -   initiating the recommended remedial action.    -   13. The method as recited in any of clauses 11-12, further        comprising performing, by the one or more computing devices:    -   identifying at least one diagnosis test of the set of diagnosis        tests based at least in part on input received via the        interactive programmatic interface.    -   14. The method as recited in any of clauses 11-13, further        comprising performing, by the one or more computing devices:    -   detecting that a directive to optimize classifier training        iterations in an automated mode has been indicated via the        interactive programmatic interface; and    -   automatically initiating, without receiving a request pertaining        to a particular diagnosis test whose result does not meet a        threshold criterion, a remedial action pertaining to the        particular diagnosis test.    -   15. The method as recited in any of clauses 6-7 or 9-11, further        comprising performing, by the one or more computing devices:    -   causing an indication of a difference between (a) a        previously-provided label for a particular data item and (b) a        predicted class label of the particular data item to be        indicated via the interactive programmatic interface.    -   16. A non-transitory computer-accessible storage medium storing        program instructions that when executed on one or more        processors cause the one or more processors to:    -   determine, corresponding to individual ones of a plurality of        classifier training iterations, respective sets of metrics;    -   cause a first visualization data set to be presented via an        interactive programmatic interface, wherein the first        visualization data set comprises of results of one or more        diagnosis tests, wherein a first diagnosis test of the set of        diagnosis tests comprises determining whether a particular        metric of the respective sets of metrics meets a threshold        condition;    -   determine, based at least in part on a result of the first        diagnosis test, whether to initiate one or more additional        classifier training iterations.    -   17. The non-transitory computer-accessible storage medium as        recited in clause 16, wherein the particular metric includes a        count of a respective number of labels corresponding to one or        more target classes of a classifier, wherein the labels        corresponding to the one or more target classes are part of a        training set for a particular classifier training iteration,        wherein at least a subset of the labels corresponding to the one        or more target classes is obtained in one or more labeling        sessions from one or more label providers.    -   18. The non-transitory computer-accessible storage medium as        recited in clause 17, wherein the subset of the labels is        obtained asynchronously with respect to a start or end of an        individual classifier training iteration of the plurality of        classifier training iterations.    -   19. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-17, wherein an individual        classifier training iteration of the plurality of classifier        training iterations comprises training one or more of: (a) a        binary classifier or (b) a multi-class classifier.    -   20. The non-transitory computer-accessible storage medium as        recited in any of clauses 16-17 or 19, wherein the instructions        when executed on the one or more processors cause the one or        more processors to:    -   determine, based at least in part on programmatic input, the        first diagnosis test, wherein the determination to initiate an        additional training iteration is based at least in part on        programmatic input overriding a result of the first diagnosis        test.

CONCLUSION

Various embodiments may further include receiving, sending or storinginstructions and/or data implemented in accordance with the foregoingdescription upon a computer-accessible medium. Generally speaking, acomputer-accessible medium may include storage media or memory mediasuch as magnetic or optical media, e.g., disk or DVD/CD-ROM, volatile ornon-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.),ROM, etc., as well as transmission media or signals such as electrical,electromagnetic, or digital signals, conveyed via a communication mediumsuch as network and/or a wireless link.

The various methods as illustrated in the Figures and described hereinrepresent exemplary embodiments of methods. The methods may beimplemented in software, hardware, or a combination thereof. The orderof method may be changed, and various elements may be added, reordered,combined, omitted, modified, etc.

Various modifications and changes may be made as would be obvious to aperson skilled in the art having the benefit of this disclosure. It isintended to embrace all such modifications and changes and, accordingly,the above description to be regarded in an illustrative rather than arestrictive sense.

1.-22. (canceled)
 23. A computer-implemented method, comprising:obtaining, at a cloud computing environment, a request to initiate aninteractive labeling session for at least a portion of machine learningdata set; presenting, from the cloud computing environment, via one ormore programmatic interfaces subsequent to receiving the request, (a) animage representing a first item of the machine learning data set and (b)a first label generated for the first item at the cloud computingenvironment; and storing, at the cloud computing environment, a secondlabel for the first item, wherein the second label is received via theone or more programmatic interfaces in response to said presenting, andwherein the second label differs from the first label.
 24. Thecomputer-implemented method as recited in claim 23, further comprising:assigning respective ranks to a plurality of items of the machinelearning data set, including the first item, based at least in part onestimated learning contributions of individual ones of the plurality ofitems to a training iteration of a machine learning model; anddetermining, based at least in part on a rank assigned to the firstitem, to present an indication of the first item via the one or moreprogrammatic interfaces, wherein the indication comprises the image. 25.The computer-implemented method as recited in claim 23, wherein thefirst label corresponds to a first class of a plurality of classes intowhich items of the machine learning data set are to be classified,wherein the second label corresponds to a second class of the pluralityof classes, the computer-implemented method further comprising:presenting, from the cloud computing environment via the one or moreprogrammatic interfaces, prior to receiving the second label, anindication of a correlation between an attribute of the first item andmembership of the first item in a particular class of the plurality ofclasses.
 26. The computer-implemented method as recited in claim 23,further comprising: receiving, at the cloud computing environment viathe one or more programmatic interfaces, an indication of ajustification for the second label.
 27. The computer-implemented methodas recited in claim 26, further comprising: presenting, from the cloudcomputing environment via the one or more programmatic interfaces, theindication of the justification.
 28. The computer-implemented method asrecited in claim 23, wherein the second label is received at the cloudcomputing environment from a first label provider, thecomputer-implemented method further comprising: obtaining, at the cloudcomputing environment from the first label provider via the one or moreprogrammatic interfaces, a filter criterion to be used to select one ormore additional items of the portion of the machine learning data setfor which labels are to be obtained; and presenting, by the cloudcomputing environment to the first label provider via the one or moreprogrammatic interfaces, a representation of a second item, wherein thesecond item is selected from the portion of the machine learning dataset using the filter criterion.
 29. The computer-implemented method asrecited in claim 23, further comprising: training, at the cloudcomputing environment, in one or more training iterations, a machinelearning model using labeled versions of items of the portion of themachine learning data set, wherein a labeled version of the first dataitem which includes the second label is used during a particulartraining iteration of the one or more training iterations, and whereinthe second label is received at the cloud computing environmentasynchronously with respect to the particular training iteration.
 30. Asystem, comprising: one or more computing devices; wherein the one ormore computing devices include instructions that upon execution on oracross one or more processors cause the one or more processors to:obtain, at a cloud computing environment, a request to initiate aninteractive labeling session for at least a portion of machine learningdata set; present, from the cloud computing environment, via one or moreprogrammatic interfaces subsequent to receiving the request, (a) animage representing a first item of the machine learning data set and (b)a first label generated for the first item at the cloud computingenvironment; and store, at the cloud computing environment, a secondlabel for the first item, wherein the second label is received via theone or more programmatic interfaces in response to presentation of theimage and the first label, and wherein the second label differs from thefirst label.
 31. The system as recited in claim 30, wherein the one ormore computing devices include further instructions that upon executionon or across the one or more processors further cause the one or moreprocessors to: assign respective ranks to a plurality of items of themachine learning data set, including the first item, based at least inpart on estimated learning contributions of individual ones of theplurality of items to a training iteration of a machine learning model;and determine, based at least in part on a rank assigned to the firstitem, to present an indication of the first item via the one or moreprogrammatic interfaces, wherein the indication comprises the image. 32.The system as recited in claim 30, wherein the first label correspondsto a first class of a plurality of classes into which items of themachine learning data set are to be classified, wherein the second labelcorresponds to a second class of the plurality of classes, and whereinthe one or more computing devices include further instructions that uponexecution on or across the one or more processors further cause the oneor more processors to: present, from the cloud computing environment viathe one or more programmatic interfaces, prior to receiving the secondlabel, an indication of a correlation between an attribute of the firstitem and membership of the first item in a particular class of theplurality of classes.
 33. The system as recited in claim 30, wherein theone or more computing devices include further instructions that uponexecution on or across the one or more processors further cause the oneor more processors to: receive, at the cloud computing environment viathe one or more programmatic interfaces, an indication of ajustification for the second label.
 34. The system as recited in claim33, wherein the one or more computing devices include furtherinstructions that upon execution on or across the one or more processorsfurther cause the one or more processors to: present, from the cloudcomputing environment via the one or more programmatic interfaces, theindication of the justification.
 35. The system as recited in claim 30,wherein the second label is received at the cloud computing environmentfrom a first label provider, and wherein the one or more computingdevices include further instructions that upon execution on or acrossthe one or more processors further cause the one or more processors to:obtain, at the cloud computing environment from the first label providervia the one or more programmatic interfaces, a filter criterion to beused to select one or more additional items of the portion of themachine learning data set for which labels are to be obtained; andpresent, by the cloud computing environment to the first label providervia the one or more programmatic interfaces, a representation of asecond item, wherein the second item is selected from the portion of themachine learning data set using the filter criterion.
 36. The system asrecited in claim 30, wherein the one or more computing devices includefurther instructions that upon execution on or across the one or moreprocessors further cause the one or more processors to: train, at thecloud computing environment, in one or more training iterations, amachine learning model using labeled versions of items of the portion ofthe machine learning data set, wherein a labeled version of the firstdata item which includes the second label is used during a particulartraining iteration of the one or more training iterations, and whereinthe second label is received at the cloud computing environmentasynchronously with respect to the particular training iteration. 37.One or more non-transitory computer-accessible storage media storingprogram instructions that when executed on or across one or moreprocessors: obtain, at a cloud computing environment, a request toinitiate an interactive labeling session for at least a portion ofmachine learning data set; present, from the cloud computingenvironment, via one or more programmatic interfaces subsequent toreceiving the request, (a) an image representing a first item of themachine learning data set and (b) a first label generated for the firstitem at the cloud computing environment; and store, at the cloudcomputing environment, a second label for the first item, wherein thesecond label is received via the one or more programmatic interfaces inresponse to presentation of the image and the first label, and whereinthe second label differs from the first label.
 38. The one or morenon-transitory computer-accessible storage media as recited in claim 37,storing further program instructions that when executed on or across theone or more processors: assign respective ranks to a plurality of itemsof the machine learning data set, including the first item, based atleast in part on estimated learning contributions of individual ones ofthe plurality of items to a training iteration of a machine learningmodel; and determine, based at least in part on a rank assigned to thefirst item, to present an indication of the first item via the one ormore programmatic interfaces, wherein the indication comprises theimage.
 39. The one or more non-transitory computer-accessible storagemedia as recited in claim 37, wherein the first label corresponds to afirst class of a plurality of classes into which items of the machinelearning data set are to be classified, wherein the second labelcorresponds to a second class of the plurality of classes, and whereinthe one or more non-transitory computer-accessible storage media storefurther program instructions that when executed on or across the one ormore processors: present, from the cloud computing environment via theone or more programmatic interfaces, prior to receiving the secondlabel, an indication of a correlation between an attribute of the firstitem and membership of the first item in a particular class of theplurality of classes.
 40. The one or more non-transitorycomputer-accessible storage media as recited in claim 37, storingfurther program instructions that when executed on or across the one ormore processors: receive, at the cloud computing environment via the oneor more programmatic interfaces, an indication of a justification forthe second label.
 41. The one or more non-transitory computer-accessiblestorage media as recited in claim 40, storing further programinstructions that when executed on or across the one or more processors:present, from the cloud computing environment via the one or moreprogrammatic interfaces, the indication of the justification.
 42. Theone or more non-transitory computer-accessible storage media as recitedin claim 37, wherein the second label is received at the cloud computingenvironment from a first label provider, and wherein the one or morenon-transitory computer-accessible storage media store further programinstructions that when executed on or across the one or more processors:obtain, at the cloud computing environment from the first label providervia the one or more programmatic interfaces, a filter criterion to beused to select one or more additional items of the portion of themachine learning data set for which labels are to be obtained; andpresent, by the cloud computing environment to the first label providervia the one or more programmatic interfaces, a representation of asecond item, wherein the second item is selected from the portion of themachine learning data set using the filter criterion.